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The Description of Supervisory Behavior * 


Edwin A. Fleishman 


USAF Air Training Command, Human Resources Research Center, 
Lackland Air Force Base, Texas ** 


Previous research in the area of leadership 
has to a large extent been concerned with 
postulated traits that leaders should possess, 
or with over-all evaluations of leadership. 
The leader’s actual behavior has been largely 
ignored. More recent research has concluded 
that leadership is to a great extent situational, 
and that what is effective leadership in one 
situation may be ineffective in another. It 
therefore seems desirable to have available a 
method of describing leadership behavior 
which can be applied to many different situa- 
ations. If this were possible then different 
leadership patterns could be related to criteria 
of effectiveness in a wide variety of group 
situations in which leaders function. 

There have been some recent attempts to 
develop methods for the description of leader- 
ship behavior. This article is concerned with 
one such attempt which was carried out within 
the framework of the Leadership Studies at 
the Personnel Research Board of Ohio State 
University. The primary emphasis in this 
article will be to describe the development of 
a Supervisory Behavior Description question- 
naire for use in an industrial situation. 


Developmental Background of the Instrument 


The Leader Behavior Description. The 
Supervisory Behavior Description is based on 
the Leader Behavior Description Question- 


* This study was carried out while the writer was 
at the Personnel Research Board, Ohio State Uni- 
versity, in cooperation with the International Har- 
vester Company. 

** Perceptual and Motor Skills Research Labora- 
tory. The opinions or conclusions contained in this 
report are those of the author. They are not to be 
construed as reflecting the views or indorsement of 
the Department of the Air Force. 


naire originally developed by Hemphill and 
the staff at the Personnel Research Board (2). 
The questionnaire contained 150 items which 
described how people in leadership positions 
operate in their leadership role.' The re- 
spondent marked for each item, how fre- 
quently the leader did what each item de- 
scribed (e.g., always, often, occasionally, 
seldom, never). 

A major problem in this endeavor was the 
classification of the items into meaningful 
categories of leader behavior. The 150 items 
were derived from over 1,800 original items 
which were written and then classified by 
“expert judges” into the following nine a 
priori “dimensions” of leadership behavior: 

1. Integration,—acts which tend to increase 
cooperation among group members or decrease 


“cooperation among them. 


2. Communication,—acts which increase the 
understanding and knowledge about what is 
going on in the group. 

3. Production emphasis, 
oriented toward volume 
plished. 

4. Representation,—acts which speak for 
the group in interaction with outside agencies. 

5. Fraternization,—acts which tend to make 
the leader a part of the group. 

6. Organization,—acts which lead to dif- 
ferentiation of duties and which prescribe 
ways of doing things. 


acts which 
of work 


are 
accom- 


1An earlier approach at the Personnel Research 
Board developed modified job analyses procedures 
for investigating types of organizational activities 
engaged in by persons in high organizational positions. 
These methods have been summarized by Stogdill 
and Shartle (5) and by Shartle (4). 
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7. Evaluation,—acts which have to do with 
distribution of rewards (or punishment). 

8. Initiation,—acts which lead to changes 
in group activities. 

9. Domination,—acts which disregard the 
ideas or persons of members of the group. 

An example of an item assigned to the Inte- 
gration area was “He encourages group mem- 
bers to work as a team.” An example of one 
assigned to the Domination area was “He 
insists that everything be done his way.” 

Subsequent administration of the form 
yielded adequate reliabilities for the nine di- 
mension scores (.71 to .88) when groups filled 
it out as describing their own leader. More- 
over, group members were consistent in how 
they described the same leader. However, the 
striking feature of repeated use of the ques- 
tionnaire in various types of situations was 
the lack of independence of the dimensions. 
Most of the intercorrelations were between .50 
and .80. 

Item analysis also showed that an item as- 
signed to one dimension by a priori methods 
might just as easily correlate more highly with 
scores on dimensions to which the item was 
not assigned. Some reorganization of the 
items, into relatively more independent cate- 
gories of leader behavior, therefore, seemed 
necessary. 

Factor Analysis and Revision of the Leader 
Behavior Description. In order to identify 
empirically the factor structure of the ques- 
tionnaire, a factor analysis of the items was 
undertaken.*? The questionnaire was admin- 
istered to 300 Air Force crew members who 
described their airplane commanders. The 
Wherry-Gaylord Iterative Factor Analysis 
Procedure (6, 7) was utilized in the analysis 
of the items. The factors extracted were 
rotated to orthogonality and then to simple 
structure. The analysis revealed two major 
factors present, together with two minor fac- 

2 This analysis was performed by B. J. Winer 
under a Human Resources Research Laboratories 


contract directed by Hemphill at the Personnel Re- 
search Board. 

8 This procedure does not require the item inter- 
correlations, but starts with item-sub-test correlations. 
That this procedure yields the same factors as 
Thurstone’s Centroid Method has been empirically 
demonstrated (6). 


tors. The major factors were defined as ‘“‘Con- 
sideration” and “Initiating Structure.” 

Items in the “Consideration” dimension 
were concerned with the extent to which the 
leader was considerate of his workers’ feelings. 
It reflected the “human relations” aspects of 
group leadership. 

Items in the “Initiating Structure” dimen- 
sion reflected the extent to which the leader 
defined or facilitated group interactions to- 
ward goal attainment. He does this by plan- 
ning, communicating, scheduling, criticizing, 
trying out new ideas, etc. 

The minor factors were tentatively labeled 
“Production Emphasis” and “Social Sensi- 
tivity.” 


Pre-Test on an Industrial Population 


New keys were developed to score the 
questionnaire along these factor dimensions. 
Items with the highest loadings and purest 
factor structure were selected for each key. 
It was felt that scoring the questionnaire 
along these four dimensions would yield lower 
intercorrelations between the dimensions and 
would thus give measures of more independent 
aspects of the leader’s behavior. A 136-item 
Supervisory Behavior Description question- 
naire was administered to a pre-test sample 
of 100 International Harvester foremen at the 
Company’s Central School in Chicago. These 
foremen, representing 17 different plants, used 
the questionnaire to describe the behavior of 
their own supervisors. The questionnaires 
were scored along the new factor dimensions 
derived from the Air Force sample. The pur- 
pose of this industrial pilot-study was to find 
out how applicable these new scales were to 
the industrial sample, and to determine what 
further revision might be necessary. 

Dimension Reliabilities and Intercorrela- 
tions. Intercorrelations of the dimension 
scores showed that they still had substantial 
overlap with one another when applied to this 
industrial population. The intercorrelations 
were between .56 and .80, with corrected split- 
half reliabilities between .77 and .95. It 
seemed possible that the categories of leader 
behavior which were most independent in this 
industrial sample might be somewhat different 
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Table 1 


Items Selected{for the Revised Form of the Supervisory Behavior Description! 


Orthogonal Factor 
Loading 


“Consid- “Initiating 


eration” Structure” 





“Consideration” 
Revised Key 
He refuses to give in when people 
disagree with him. —.68 
*He does personal favors for the 
foremen under him. 40 
He expresses appreciation when 
one of us does a good job. 
He is easy to understand. 
“He demands more than we can 
do. 
*He helps his foremen with their 
personal problems. 
*He criticizes his foremen in front 
of others. 
He stands up for his foremen even 
though it makes him unpopular. 
He insists that everything be done 
his way. 
He sees that a foreman is re- 
warded for a job well done. 
He rejects suggestions for changes. 
*He charges the duties of people 
under him without first talking 
it over with them. 
He treats people under him with- 
out considering their feelings. 
He tries to keep the foremen 
under him in good standing with 
those in higher authority. 
He resists changes in ways of 
doing things. 
*He “rides” the foreman who 
makes a mistake. 
*He refuses to explain his actions. 
*He acts without consulting his 
foremen first. 
**He stresses the importance of 
high morale among those under 
him. 
He backs up his foremen in their 
actions. 62 
He is slow to accept new ideas. 66 — .06 
He treats all his foremen as his 
equal. 66 .28 











“Consid- 


Orthogonal Factor 
Loading 


“Initiating 


eration” Structure” 


“Consideration” 


Revised Key 


He criticizes a specific act rather 
than a particular individual. 

He is willing to make changes. 
He makes those under him feel at 
ease when talking with him. 

He is friendly and can be easily 
approached. 

He puts suggestions that are made 
by foremen under him into oper- 
ation. - 

He gets the approval of his fore 
men on important matters before 
going ahead. 


65 


“Initiating Structure” 


Revised Key 
**He encourages overtime work. 
*He tries out his new ideas. 
He rules with an iron hand. 
He criticizes poor work. 
**He talks about how much 
should be done. 
*He encourages slow-working 
foremen to greater effort. 
He waits for his foremen to push 
new ideas before he does. 
He assigns people under him to 
particular tasks. 
He asks for sacrifices from his 
foremen for the good of the entire 
department. 
He insists that his foremen follow 
standard ways of doing things in 
every detail. 
He sees to it that people under 
him are working up to their limits. 
*He offers new approaches to 
problems, 
He insists that he be informed on 
decisions made by foremen under 
him. 
He lets others do their work the 
way they think best. 


.20 
—.10 
— .20 
—.18 
—.20 

17 
— .07 


00 


—.17 


36 


-.17 


‘Items not starred used the format: 1. always; 2. often; 3. occasionally; 4. seldom; 5. never. Items preceded 


— 33 


by an asterisk (*) used the format: 1. often; 2. fairly often; 3. occasionally; 4. once in awhile; 5. very seldom. 


Items preceded by a double asterisk (**) used the format: 1 


4. comparatively little; 5. not at all. 


. a great deal; 2. fairly much; 3. to some degree; 
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Table 1—continued 


Orthogonal Factor 
Loading 


“Consid- “Initiating 
eration” Structure” 


“Initiating Structure” 
Revised Key 

**He stresses being ahead of com- 
peting work groups. 
**He “needles” foremen under 
him for greater effort. 
He decides in detail what shall be 
done and how it shall be done. 
**He emphasizes meeting of dead- 
lines. 
*He asks foremen who have slow 
groups to get more out of their 
groups. 2 AO 
**He emphasizes the quantity of 
work, ‘ a 


from those found most independent in the Air 
Force data. 

Item Analysis. In order to clarify this 
problem and to revise the questionnaire for in- 
dustrial use a statistical analysis was carried 
out at the item level. Two kinds of informa- 
tion were obtained concerning each of the 136 
items in the Supervisory Behavior Description 
questionnaire. First, the distributions of re- 
sponses among the five choices for each item 
were considered. Second, tetrachoric correla- 
tions of every item with each dimension total 
score were calculated to give indices of the in- 
ternal consistency of the dimensions and to 
reveal the sources of overlap between the di- 
mensions. Thus, coefficients were not only 
computed between an item and its own dimen- 
sion total score, but with each of the other 
three dimension total scores to which it had 
not been assigned. 

This analysis revealed that most of the 
items correlated highly with the dimension to 
which they were assigned. However, it was 
also evident that most of the items correlated 
highly with one or more dimensions to which 
they were not assigned. 

Following the Wherry-Gaylord rationale (6, 
7), the item-dimension correlations were con- 
sidered factor loadings of the items on the four 
oblique (correlated) dimensions. In order to 


compare the loadings with those obtained from 
the Air Force population, transformation to 
orthogonality was accomplished and it ap- 
peared, by inspection, that this transformation 
brought the loadings more in line with the 
original factors derived from the factor analy- 
sis. Item loadings increased on dimensions to 
which they were assigned and decreased on 
other dimensions. This seemed especially true 
for the two major factors (Consideration and 
Initiating Structure). Further preiiminary 
rotations were then made with the primary 
objective of rotating the items originally in 
the two minor factors into more independent 
clusters. It appeared that this might not be 
possible, and in the light of the high correla- 
tions between these factors and the other two, 
their utility as separate dimensions was ques- 
tioned for this population. Practically all the 
variation could be accounted for by the two 
major dimensions. 

The Revised Questionnaire. Based on the 
item-dimension loadings derived from this in- 
dustrial population, two revised scoring keys 
were developed,—one for “Consideration” and 
one for “Initiating Structure.’ Criteria for 
item inclusion were: (1) the item should have 
a high loading with the dimension in which it 
was to be included; (2) the item should have 
as close to zero loading as possible on the 
other factor; and (3) items which did not 
discriminate among supervisors (most re- 
spondents picking the same alternates) were 
rejected. 

Twenty-eight items best meeting these cri- 
teria for “Consideration” and 20 items for 
“Tnitiating Structure” were selected. Table 
1 presents the items finally selected for the 
revised form. The loadings given are those 
derived from this industrial population. 

It can be seen that most of the items as- 
signed to each key have high loadings with 
that dimension and insignificant loadings with 
the other. In addition, one more step’ was 
carried out. It was possible to select items 
for the “Initiating Structure” key so that some 
items had small negative loadings on ‘“Con- 
sideration,” and others had small positive 
loadings on “Consideration.” It was hoped 
that the total effect of this would be to cancel 
out further the unwanted variance in the “In- 
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itiating Structure” key due to these cumula- 
tive small loadings on ‘‘Consideration.”’ 

The items in each key were, as before, ran- 
domly distributed through the questionnaire. 


Administration of the Revised Form 


This 48-item revised Supervisory Behavior 
Description was then administered to another 
comparable sample of 122 foremen in one of 
the International Harvester Company’s plants. 
Again they were to describe their own super- 
visors. Assurances were again given that no 
one in the company would see their answers. 

Table 2 presents some of the results. 

From the results on this sample, it appeared 
that the scores on the two dimensions were 
now independent of each other. 

Another index of the utility of the instru- 
ment is the agreement among different re- 
spondents who describe the same supervisor’s 
behavior. The variation in scores can be di- 
vided into that between descriptions of dif- 
ferent supervisors and that within descriptions 
of the same supervisor. This “within descrip- 
tion” variation represents lack of agreement 
between respondents describing the same su- 
pervisor. The analysis of variance revealed 
significantly less variation among descriptions 
of the same supervisor than between descrip- 


Table 2 


Means, Standard Deviations, Range, Reliabilities, and 
Intercorrelations of the Dimension Scores of the 
Revised Supervisory Behavior Description 
(N = 122) 





Consid- 
eration 


Initiating 
Structure 


No. of Items 28 20 
Mean 82.3 51.5 
Standard Deviation 15.5 8.8 
Range! 22 to 106 13 to 68 


Reliability? .92 .68 
Intercorrelation — .02 


‘In this form, the alternatives for each item were 


weighted from zero to four. Thus, the highest possible 
score was 112 for Consideration and 80 for Initiating 
Structure. 

? Split-half correlations corrected to full length of each 
dimension by the Spearman-Brown formula. 


tions of different supervisors.‘ This appears 
to be further evidence of the objectivity of 
this questionnaire procedure. 

The questionnaire was also administered to 
a sample of 394 workers who described the be- 
havior of their own foreman. In this case the 
reliabilities of the scales were .98 and .78 and 
the correlation between them was —.33. It 
will be recalled that the pre-test sample con- 
sisted of people at the foreman level, so it 
might be expected that the correlation be- 
tween dimensions would be somewhat higher 
in this sample of workers. This correlation 
is still considerably lower than had been 
obtained between dimensions with previous 
forms of the instrument. An analysis of vari- 
ance again revealed significant agreement 
among workers describing the same foreman. 

It appeared that the two dimensions isolated 
were quite meaningful in this industrial situa- 
tion. Apparently a supervisor could be high 
in Consideration without necessarily being 
high or low in the amount of planning, push- 
ing for production, scheduling, or initiating 
behavior engaged in. At least the usual “halo 
effect” from scale to scale that occurs in most 
instruments in this area, seems for the most 
part to have been eliminated. The independ- 
ence of the dimensions has special relevance 
when one considers the relationships of each 
of the two dimensions with some external cri- 
teria of group effectiveness. 

The development of external criteria of 
group effectiveness was far beyond the scope 
of the present study. However, the Industrial 
Relations Department of the plant did have 
available the number of labor grievances filed 
for each of 23 departments during an eight- 
month period. These were reduced to griev- 
ances per worker for each department and cor- 
related against the mean scores derived (from 
descriptions by foremen) for the general fore- 
man in charge of each department. Although 
the N of 23 is pitifully small, and the records 
attenuated by many uncontrollable factors, 


4Peters and Van Voorhis (3) suggest the con- 
version of F ratios to epsilon (€), a statistic which 
indicates the strength of relationship. For these 
results €=.65(P<.01) for Consideration and 
.47(P<.05>.01) for Initiating Structure. 

5¢€= .72(P<.01) for Consideration and .64(P< 
01) for Initiating Structure. 
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correlations of —.43 with “Consideration” and 
.26 with “Structure” were obtained. Only the 
first coefficient is statistically significant. The 
trend, however, was for the high grievance de- 
partments to be those with supervisors lower 
in consideration and higher in the amount 
of structuring in their leadership behavior. 
These results, of course, are purely suggestive. 
A more highly controlled criterion study of 
group effectiveness, and relationships to these 
dimensions is a program of future research. 

The instruments have also been found use- 
ful in evaluating a leadership training program 
for foremen in the company and in studying 
relationships of leader behavior with certain 
factors in the social situation in which the 
foremen operate (1). 


Summary 


This paper has described the development 
of one approach to the problem of describing 
leadership behavior in industry. A question- 
naire, based on earlier work by Hemphill, was 
constructed. By means of this questionnaire 


the leadership behavior of supervisors could 
be objectively described. The questionnaire 


measures two relatively independent leader- 
ship dimensions found meaningful in the in- 
dustrial situation,—‘Consideration” and “TIni- 
tiating Structure.” 

There is no implication in the study as to 
the degree of each kind of behavior that is 
desirable or undesirable. Recognizing the 


situational nature of leadership, the need for 
relating these scales to effectiveness of par- 
ticular kinds of groups in well-controlled cri- 
terion studies is stressed. Moreover, the study 
reported here was confined to supervisors in 
one particular company. 

The questionnaire at present is regarded 
only as a research instrument for the study of 
leadership behavior. More research applying 
the scales to other industrial situations needs 
to be done before they can be more confidently 
assessed. 


Received May 5, 1952. 
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A Validation Study of “How Supervise?” 


Joseph Weitz and Robert C. Nuckols 


Life Insurance Agency Management Association, Hartford, Conn. 


The problem of whether or not “How Super- 
vise?” is an intelligence test has been discussed 
in several recent articles. Millard * has briefly 
discussed these studies and has presented ad- 
ditional data showing a relationship between 
this test and intelligence. We have been in- 
terested in determining whether or not “How 
Supervise?” is a test of supervisory ability and 
incidentally have some findings which may be 
relevant to its relations with intelligence. 


Procedure 


A modification of “How Supervise?” was 
taken by 78 District Managers in one life in- 
surance company. These districts are located 
throughout most of the southern and border 
states. The Managers supervise and direct 
the work of varying numbers of agents, rang- 
ing from 8 to 100. 

By arrangement with the Psychological Cor- 
poration the test was modified by taking items 
from forms A and B and combining them into 
a test of 100 items. We used 20 items from 
the section Supervisory Practices, 32 items 
from Company Policy, and 48 items from 
Supervisor Opinions. 

The kinds of changes made in the test were 
these: “Admitting to your workers when you 
make a wrong decision,” was changed to “Ad- 
mitting to your agents when you make a 
wrong decision.” ‘Requiring supervisors to 
submit in writing their reasons for firing or 
penalizing any employee,’ was changed to 
“Requiring Managers to submit in writing 
their reasons for firing or penalizing any 
agent.” These were the orly changes made; 
that is, substituting “Manager” for “super- 
visor” and “agent” for “worker” or “em- 
ployee.” 

The 100 items were put together into one 
test which we called the Manager’s Inventory. 
This test was mailed to 83 managers and, as 
we mentioned earlier, 78 of them returned 


1 Millard, K. A. Is How Supervise? an intelligence 
test? J. appl. Psychol., 1952, 36, 221-224. 


completed questionnaires. Here the second 
difference occurred, that of a change in the 
testing conditions. The instructions for each 
section were the same as in the original test 
with the exception that “Manager” was sub- 
stituted for “supervisor.” However, it was 
truly self-administered with no time limit. 
The Managers signed the questionnaire. This 
permitted validation against certain criteria 
data for each district. 


The Criteria 


Many different criteria were used. These 
included three production criteria: production 
of ordinary insurance, industrial insurance, 
and ordinary increase. (For those of us who 
do not know much about insurance terminol- 
ogy, it should suffice to say that these are 
three measures related to volume of sales.) 
We used as another criterion the number of 
men who terminated in each district during 
1951. This figure was corrected for size of 
the district. We also used the four-year turn- 
over, again corrected for district size, for the 
period 1947 through 1950. (This criterion 
has on odd-even year reliability of .77.) An- 
other criterion was the pers!stency of the busi- 
ness sold, i.e., the average lapse ratio for each 
district. This might be thought of as the 
quality of the business. 

In addition to the above criteria we had 
certain biographical data on each Manager. 
The only part of this information which we 
will discuss in the present paper is the highest 
school grade completed. 


Scoring of the Test 


A number of different scores were obtained 
for each part and for the total test. For the 
total and for each of the parts we obtained the 
number of items right, the number wrong, the 
number right minus the number wrong, and 
the number of question marks. The correct 
answers were obtained by using the key origi- 
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Table 1 


— —————_———_——— 


Ordinary 
Production Ratio 
Per Man Per Man 


Lapse Ordinary 


Increase 
Per Man 


Number Right 
Part I 
Part II 
Part III 
Total 
Number Wrong 
Part I 12 09 
Part II .26 ; .23 
Part III .20 7 .16 
Total .28 .24 
Right-Wrong 
Part I 12 , .06 
Part II j 15 
Part III ‘ 06 
Total ‘ J 12 
Number of ? 
Part I d i 03 
Part IT : i ‘ 14 
Part III as ; 16 
Total F 15 


— .08 19 — .03 
—.12 —.10 — .05 
—.O1 Bt 02 
— .08 09 —.01 


Correlation of Scores vs. Criteria Measures 


Industrial 
Increase 1951 
Per Man Turnover 


4 Year 
Turnover 


Education 
Level 


— .04 —.18 Al 
—.11 —.23 —.02 
— .06 —.19 ae 


—.11 — .26 


07 .08 
— 13 14 
—.05 
— .08 09 


—.il 18 
—.01 .22 
—.02 13 
— .02 By 


07 14 

.06 .23 AS 
— .04 12 .22 
— .05 17 21 








r = .22 significant at 5% level. 
r = .29 significant at 1% level. 


nally devised for each of the appropriate items 
in “How Supervise?” 


Results 


The results are shown in Table 1. It can 
be seen that most of the correlations are be- 
low the five per cent level of significance with 
the exception of the scores vs. education where 
more of the correlations are above the five per 
cent level than could be expected by chance 
alone. 

After finding no over-all significant rela- 
tionship between the scores and the criteria, 
we did an item analysis on half of the cases. 
Using high and low district termination rate 
as the criteria we isolated those items, about 
twenty in all, which seemed to differentiate 
these two groups to some extent. In those 
items which differentiated the groups, the an- 


swers predominately given by the low termina- 
tion group were scored as correct. We now 
applied our new scoring key to the other half 
of the sample. It did not cross-validate; on 
the other half of the sample there was no 
relationship between the score obtained with 
the new key and termination rate. 


Conclusion 


If the minor modifications of the test did 
not change “How Supervise?” materially, it 
would look as if this test is not valid in this 
situation for predicting agent turnover or pro- 
duction, both of which we feel should be re- 
lated to supervisory ability. From our results 
the only thing this test seems to relate to is 
educational (intelligence?) achievement. 


Received November 28, 1952. 
Early publication. 
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With the popularization of the finding from 
World War I of a positive relationship be- 
tween occupational level and intelligence test 
score, the notion developed that ‘for each oc- 
cupation there is an optimal level of intelli- 
gence. This belief led to a series of studies, 
particularly during the 1920's, which in gen- 
eral showed a curvilinear relationship to exist 
between scores on intelligence tests and labor 
turnover. Those individuals on a particular 
job who earn intelligence test scores at ap- 
proximately the average of the group tested 
tend to remain on the job a longer time than 
those who earn scores at either extreme (e.g. 
5). 

Little attention has been given to the prob- 
lem of the relationship between labor turnover 
and scores on types of tests other than those 
of intelligence. With tests of “specific” apti- 
tudes the primary interest has been in dis- 
covering the correlations between test scores 
and some criterion measures of job proficiency 
or success in training. It is possible that the 
criterion of length of service on the job might 
have a curvilinear relationship with scores on 
specific aptitude tests as length of service has 
been shown to have with intelligence tests. If 
this were the case then there would be reason 
to question the notion that optimal intelligence 
test scores for various jobs are indicative of 
the “intellectual requirements” of the jobs. 
Intellectual factors other than those measured 
by intelligence tests would have to be con- 
sidered. The study reported here was under- 
taken to investigate the nature of the relation- 
ship between scores on tests not ordinarily 
considered intelligence tests and labor turn- 
over. 


Methods and Procedure 


The subjects used in the present investiga- 
tion were taxicab drivers. At the time they 
applied for work they were administered a 
number of tests as a part of the hiring pro- 
cedure. To some extent the scores on these 


tests were taken into account in the decision 
regarding employment. But other factors 
such as age, nature of previous experience, and 
scores on an interest questionnaire also entered 
into the hiring decision. 

Those men who were ultimately hired were 
divided into two groups, those who stayed on 
the job for three months or more and those 
who left in less than three months. No dif- 
ferentiation was made between individuals 
who were separated for cause and those who 
left voluntarily. The number of enforced 
separations was very small, and resignations 
appeared in some cases not to be wholly volun- 
tary but rather as a means for avoiding dis- 
ciplinary action. The only individuals not 
included were those who were terminated be- 
cause of illness, called to the armed services, 
or transferred to other jobs within the com- 
pany. 

All of the tests utilized were of the paper 
and pencil variety. The tests are listed in the 
accompanying tables. All three arithmetic 
tests involved computations but varied in the 
complexity of the problems presented. The 
Speed of Reactions tests involved making dif- 
ferential checking responses in accordance 
with pre-established rules to presentations of 
letter stimuli varying in spatial organization. 
In Test I the rules were given on each page 
and in Test II the rules had to be remem- 
bered. In the Dotting and Tapping tests, 
scores were based upon the speed with which 
dots were placed in small printed circles by 
means of a pencil. In the Dotting test, preci- 
sion of movement was more of a factor than 
in the Tapping test because the circles were 
much smaller in size. The Judgment of Dis- 
tance test required judgments about the rela- 
tive distance of pictured objects based _pri- 
marily on cues of perspective and interposition 
of objects. The Distance Discrimination test 
required judgments about the relative lengths 
of lines. In the Mechanical Principles test, 
problems involving knowledge of mechanical 
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functions and principles were presented. A 
more detailed description of these tests has 
been given elsewhere (2). 

All men did not take all tests. In the pres- 
ent analysis the numbers of cases per test 
ranged from 218 to 441. Scores on each test 
were transmuted into normal standard scores 
on a nine-point scale following the procedure 
utilized in the Aviation Psychology Program 
of the Air Force (3). In standardizing the 
tests on this scale all applicants were utilized, 
whether they were hired or not. The distribu- 
tions of scores of cases utilized in the subse- 
quent analyses are given in Table 1. 


Results 


For each score on the various tests the per 
cent of individuals leaving the job in less than 
three months is given in Table 2. These data 
are shown graphically in Figure 1. For three 
of the tests, Speed of Reactions Il, Judgment 
of Distance, and Mechanical Principles, no re- 
lationship of any kind is apparent between 
test scores and turnover. For the remaining 
tests, curvilinear relationships occur and tend 
to be of the U-shaped kind found with in- 
telligence tests, that is, individuals earning 
either high or low scores are more likely to 
quit the job sooner than those earning scores 
in the middle of the range. 

The most consistent and striking relation- 
ship between test scores and turnover holds for 


Table 1 


Numbers of Hired Taxicab Drivers Earning Various 
Scores on the Several Tests 


Score 





Test 





Complex Arithmetic 
Intermediate Arithmetic 
Simple Arithmetic 
Speed of Reactions I 
Speed of Reactions II 
Dotting 

Tapping 

Judgment of Distance 
Distance Discrimination 
Mechanical Principles 





Table 2 


Per Cent of Taxicab Drivers Leaving Their Jobs in 
Less Than Three Months in Relation to 
Scores on Various Tests 


Score 








1 to 
Test 3 


Complex Arithmetic 42 17 
Intermediate Arithmetic 29 
Simple Arithmetic Z K 27 
Speed of Reactions I 27 
Speed of Reactions IT 3 33 
Dotting 32 
Tapping 29 
Judgment of Distance : 40 
Distance Discrimination 32 
Mechanical Principles d 42 








the Intermediate Arithmetic test. For Com- 
plex Arithmetic, Simple Arithmetic, Speed of 
Reactions I, Dotting, Tapping, and Distance 
Discrimination, scores and turnover seem to 
be correlated to about the same degree. No 
relationship is apparent between turnover and 
scores on Speed of Reactions II, Judgment of 
Distance, and Mechanical Principles. 

Scores on all three of the arithmetic tests 
are related to turnover, as are scores on three 
of the four speeds tests (Speed of Reactions 
I, Dotting, and Tapping), and one of the two 
spatial tests (Distance Discrimination). It is 
therefore difficult to arrive at any generaliza- 
tion concerning the general factors in the tests 
which give the best forecast of turnover. 

On the seven tests that are related to turn- 
over, the optimal score varies between 5 and 6. 
It is 5 or very close to that value for Inter- 
mediate Arithmetic and Simple Arithmetic; 
about 6 for Complex Arithmetic, Speed of Re- 
actions I, Dotting, and Tapping; and 5.5 for 
Distance Discrimination. Thus the optimal 
score on each of these tests is equivalent to or 
a little higher than the average score of this 
particular group of applicants. 


Discussion 


From the findings of the present study, it 
is apparent that scores on some tests which in 
content are quite different from intelligence 
tests are related to labor turnover in the same 
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3 Score on 9 
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Per cent of taxicab drivers leaving their jobs in less than three months in 


relation to scores on various tests. 


manner as are scores on intelligence tests. 


Not only is the nature of the relationship the 
same, as a U-shaped relation, but the optimal 
scores, scores where turnover is at a mini- 
mum, fall at about the same place in the 
distribution of scores as do the optimal scores 


on intelligence tests. Some of the tests util- 
ized in the present investigation, such as tap- 
ping and dotting, obviously measure quite 
simple functions which are unrelated to those 
measured by ordinary intelligence tests. The 
nature of the relationship with turnover, how- 
ever, is the same. 

The optimal scores on intelligence tests 
found in previous investigations, together with 
the relationship between intelligence test 
scores and occupational level, have been taken 
to indicate the “intellectual requirements” of 
jobs. In the present study we find the same 
kind of optimal scores with tests quite dif- 
ferent from intelligence tests. Furthermore it 
has been found that even with tests of simple 
functions a similar relationship exists between 
scores and occupational level (4). It is not 
altogether certain, then, that the hierarchical 
levels of occupations with respect to intelli- 


gence test score are to be accounted for solely 
on the basis of “intellectual requirements.” 
Finally, it can be pointed out that in some in- 
stances turnover and intelligence test scores 
though correlated are not related in the U- 
shaped fashion. In Table 3 aré data we have 
reported in a different form elsewhere con- 
cerning the relationship between intelligence 
test scores and turnover among bus drivers 
(1). In this occupation the large number of 
terminations was again the result of volun- 
tary separation. From Table 3 it can be seen 
that turnover is at a minimum at the high 


Table 3 


Turnover Among Bus Drivers in Relationship to 
Intelligence Test Score 


% Leaving Under 
6 Months 


Score 





33 
49 
57 
00 
62 


50-60 
40-49 
30-39 
20-29 

0-19 
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score levels and as the scores decrease there 
is an increasing rate of turnover. 

It is not clear just exactly what generaliza- 
tions can be drawn concerning the nature of 
the relationship between test scores and turn- 
over. Certainly the use of the concept of “in- 
tellectual requirements” does not seem to be 
a satisfactory explanation. That is, the idea 
that turnover is a function of the distance, 
either plus or minus, of the person’s intelli- 
gence from the mean intelligence for the job 
is not necessarily borne out. Just what types 
of aptitude tests give satisfactory predictions 
of turnover and what the nature of the rela- 
tionship is between turnover and various apti- 
tude qualifications is still obscure. 


Summary 


Ten tests measuring several kinds of apti- 
tudes were administered to groups of 218 to 
441 taxicab drivers. For seven of the tests a 
U-shaped relationship was found between test 
scores and turnover, those individuals earning 
either high or low scores being more likely to 


leave the job than those earning scores around 
the average of the group. Since this relation- 
ship is very similar to that found between 
scores on intelligence tests and turnover, it 
was concluded that the notion of “intellectual 
requirements” as an explanation of the U- 
shaped relationship between turnover and in- 
telligence test scores is not wholly satisfactory. 


Received May 2, 1952. 
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A test of Structural Dexterity emerged after 
a long period of testing in Technical High 
School of Buffalo, New York. Earlier ex- 
periments with the assembly of mechanical 
objects gave way to the construction of pro- 
gressively complex structures of three dimen- 
sions. 

In this test two different lengths of metal 
bars and pins comprised the unit parts. These 
were manually inserted and built upon a board 
divided into sections with holes drilled for 
each unit structure. The subject built each 
structure by interpreting the size and posi- 
tion of the parts from perspective sketches 
presented on a card. Features of the test 
follow: 


Fic. 1. 


13 


if 


1. A configuration of holes was adopted 
which became the basis for twelve different 
structures. The complete test utilized all 
areas of the board twice. 

2. The progression from simple to complex 
structure gradually advanced the subject from 
one to two, and then from two to three level 
structures; from right angle to oblique posi- 
tions; and from firmly built structures to 
movable balanced structures which required 
greater finger dexterity. 

3. The score was determined by adding up 
the total number of pins and bars correctly 
placed. Deductions were made for errors. 
Testing time: 14 minutes. See accompanying 
photo in Figure 1. 


Photograph of apparatus and sketches of the Structural Dexterity Test. 
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The Criteria 


Five criteria were developed for this test. 
The first criterion, (C,), was the average of 
two ratings by a machine shop instructor. 
These ratings were not upon any specific job 
or project, but covered specific working traits, 
after three or four months of shop work. The 
second criterion, (C,), was the time in clock 
hours for the student to complete an assigned 
project of a small “C” clamp. Instructions 
were uniform and a detailed drawing and list 
of operations were furnished each student. 
The third criterion, (C,), was an averaged 
evaluation of layout, precision, and quality of 
work on the same “C” clamp by two judges, 
A and B, who were uninformed of the shop 
experience and behavior of individual stu- 
dents. The following scale provided the ob- 
jective evaluation: 


Clamp screw 
. Threading 
. Knurling 
. Total length 
. Knurl length 
. Hole, drilled 
. Thread tested 
. Chamfer 


(0, 1,{2) 
(0, 1, 2) 
(Q, 1) 
(0,1) 
(0,41) 
(0, 1) 
(0, 1) 


Total score 


Clamp 


. Outside contour 
. Inside contour 
. Filing 

. Finish 

. Hole, true 


(0, 1, 2) 
me 
sneeeeee(O, 1, 2) 
eavee-(O, 1,82) 
scale ae 


Total score 
Total, clamp and screw 


Outside and inside contours were judged with the aid 
of a special steel template on a 7%” tolerance basis. 


The fourth criterion, (C,), was the averaged 
evaluation, C,, plus a time bonus. This bonus 
was developed from the time in hours for each 
job and was determined by the shop instruc- 
tors: it weighted time compared with quality 


of work on a 1:2 ratio. The fifth criterion, 
(C,) was the shop teacher’s evaluation on the 
objective scale plus the time bonus. 


The Group and the Measures Used 


A group was chosen which could be readily 
and precisely rated on their shop work. All 
pupils registered in first year Machine Shop 
were selected. This comprised a sub-group of 
62 students in the 9th grade of the Electrical 
Course and another sub-group of 38 10th 
grade students of the Mechanical Course. 
Scores were available for these sub-groups on 
the following measures: Henmon-Nelson In- 
telligence Quotient, (IQ); Space and Nu- 
merical Ability, (DS and DN); on the Dif- 
ferential Aptitude Tests, (DAT); The Struc- 
tural Dexterity Test described, (SD); the 
Purdue Pegboard, (PP), using the total score 
of Right plus Left plus Both Hands; and a 
test of Repetitive Operations, (RO), com- 
prising nuts and bolts to be fastened to a block 
with the aid of an end wrench. 

A comparison of the 9th and 10 grade sub- 
groups was undertaken and the means of the 
scores on the Structural Dexterity and the 
Purdue Pegboard tests were found to be sig- 
nificantly greater for the 10th grade than for 
the 9th grade sub-group. SD correlated with 
C, ....44 for the 9th grade and .17 for the 
10th grade sub-group. The Purdue Pegboard 
as well as both Differential Aptitude tests gave 
consistently low correlations (.08 to .30) with 
C, for both sub-groups. A definite age differ- 
ence of one year and three months existed be- 
tween the sub-groups, and a marked difference 
in age correlations appeared: Age with C, 
gave .15 for 9th grade and —.32 for 10th 
grade sub-group. Since these were the only 
unusual differences noted in the sub-groups, 
the combination seemed justifiable. 


Reliability and Validity 


Evidence of the reliability of the criteria 
was obtained. As previously stated, the third 
and fourth criteria were based upon the evalu- 
ations of two judges. The fifth criterion was 
based upon the evaluation of the shop instruc- 
tor. Correlations of these evaluations follow: 








Instructor 


Judge A > 76 
Judge B 72 











Validity of a Structural Dexterity Test 


Table 1 


Intercorrelations in the Prediction of Several Shop Success Criteria. 
used for all coefficients. N = 100 


SD RO PP DN 
48 44 13 

19 —.05 

—.02 


.29 
— .05 
14 
18 


SD 
RO 
PP 
DN 
DS 
IQ 
Age 


Cc; 
Ce 
Cs 
Cy 
Cy 


Pearson formula 





Using the correlation between Judges A and 
B, the Spearman-Brown formula yields a co- 
efficient of .87 for the group of 100 students 
evaluated. This may be considered the re- 
liability of the third criterion, and the mini- 
mum reliability of the fourth criterion. 

The reliability of the SD test was deter- 
mined by a method similar to the split-half 
technique. The coefficient for a group of 92 
students in 9th and 10th grades was .88. Ap- 
plying the Spearman-Brown formula the en- 
tire test would give .94. 

An intercorrelation of factors and criteria 
is presented in Table 1. 

With the exception of C,, the time criterion, 
it is significant that the Structural Dexterity 
Test has higher validity than any of the other 
selected tests. More significant results might 
be obtained with age held constant. 


Summary 


1. A test of structural dexterity shows sig- 
nificant differentiation in the performance of 
9th and 10th grade technical high school stu- 
dents. The reliability by odd-even correlation 
employing the Spearman-Brown formula was 
.94 for a group of 92 students. 


2. This test appears to be a valid measure 
of mechanical ability in a limited sense. It is 
a definite aid in the prediction of general ma- 
chine shop success. The correlation for 100 
subjects with averaged shop instructors’ rat- 
ings on specific shop traits was .38; with time 
in hours to complete a specific job —.38; with 
averaged evaluation of a specific job by two 
judges .30; with this averaged evaluation plus 
a time bonus .41; and with a shop instructor’s 
evaluation plus a time bonus .51. 

3. This test of structural dexterity shows 
significant overlap with a test of repetitive 
operations (.48) and with the Purdue Peg- 
board, Right plus Left plus Both Hands score, 
(.44). 

4. With multiple correlation formula, based 
upon the data presented, it was found that 
four selected factors, Structural Dexterity, 
Repetitive Operations, Space Relations (Dif- 
ferential Aptitude Battery) and Age, predicted 
the averaged evaluation plus the time bonus 
to the extent of .53. The multiple correlation 
between the same four factors and a shop in- 
structor’s evaluation plus the time bonus was 
$7. 


Received March 24, 1952. 





, Tue Journat or Apptiep PsycHoLocy 
Vol. 37, No. 1, 1953 


The Changing of Mental Test Norms in a Southern 
Industrial Plant 


Joseph E. Moore 
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and 
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Union Bag and Paper Corporation, Savannah, Georgia 


In 1947 Bennett and Wesman (1) pre- 
sented certain scores, which had been accumu- 
lated by Union Bag and Paper Corporation of 
Savannah, Georgia, on white men and women 
job applicants. The authors stated that for 
a given population local norms were the most 
meaningful. They also pointed out the often 
occurring problem of differences between local 
norms and “national’’ norms on which the test 
was originally based. 

The problem of changes occurring in a given 
plant population from year to year naturally 
arises. It was our hope that the present study 
would throw some light on this subject. Does 
the level of performance, as measured by the 
Revised Beta Examination, remain relatively 
stationary for job applicants over a period of 
four or five years in a particular industrial 
plant? 

The data on which this study was based 
cover a period from 1947 to 1951 and include 
all white men and women applicants who were 
given the Revised Beta Examination. The 
Union Bag and Paper Corporation requires all 
applicants who pass a preliminary interview to 
take a battery of psychological tests one of 
which is the Revised Beta Examination. 

The subjects used in this study were 8,818 
white men and 5,288 white women who ap- 
plied for work at the Union Bag and Paper 
Corporation between the years 1946 and 1951. 
The average score (all scores used in this 
study are unweighted) earned by the men on 
the Beta Test was 83.8; the Standard Devia- 
tion for these scores was 15.9. The median 
score for the men was 84.5. The Stanford 
Binet Mental Age equivalent for this group of 
men applicants is 14 years (2). The Otis 
Self-Administering Test, Higher. Examination, 


Form A, score equivalent for the average of 
our group would be 33 points. 

The average score for the women was 77.4 
with a Standard Deviation of 15.8. The 
median for this group was 78.4. The Stanford 
Binet Mental Age equivalent for the women 
is 13 years, 1 month. A comparable score on 
the Otis Self-Administering Test, Higher Ex- 
amination, Form A, would be 23. 

The scores on the Revised Beta Examina- 
tion for the groups in this study were com- 
pared with the scores obtained by Bennett and 
Wesman in an earlier study in the same plant 
in 1947 (1). Table 1 presents the reliability 
of the difference between the means of these 
groups of men applicants. 


Table 1 


Reliability of the Difference Between Mean Scores 
on The Revised Beta Examination for 
Men Industrial Applicants 


Num- 
Group ber 


Bennett & Wesman 
(1947) 

Moore & Ross 
(1951) 


Mean 


S.D. 


1,362 80.5 


17.7 


8,818 83.8 15.9 


** Significant at .01 level of confidence. 


In Table 1 it will be seen that the two 
groups of men industrial applicants are statis- 
tically significantly different. The mean men- 
tal ‘test scores, however, earned by the 1951 
men applicants is only 3.3 points higher than 
the mean of the 1947 group studied by Ben- 
nett and Wesman. This is less than one-fifth 
of the Bennett and Wesman S.D. of 17.7. 
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Table 2 


A Comparison of The Mean Scores on the Revised 
Beta Examination of Two Groups of 
Women Applicants 


Num- 


Group ber 


Bennett & Wesman 
(1947) 

Moore & Ross 
(1951) 


Mean  5S.D. 


1,083 72.9 


5,288 77.4 15.8 


** Significant at the .01 level of confidence. 


The 1951 mean is at the 55 percentile point on 
the 1947 percentile norms. 

In Table 2 it will be seen that a difference 
in the mean scores of the women applicants 
on the Revised Beta Examination has also 
occurred. The 1951 group is performing at 
a higher level on the Beta Test than was true 
of the 1947 group. The 4.5 points difference 
in the mean is slightly larger for the women 
applicants than was found in the case of the 
men applicants. This 4.5 point difference is 
about one-fourth of the Bennett and Wesman 
S.D. of 17.5. The 1951 mean is also at the 55 
percentile point on the 1947 percentile norms. 


Summary and Conclusion 


Bennett and Wesman reported on men and 
women applicants who took the Revised Beta 


Examination prior to 1947. The data from 
these two investigators were compared with 
men and women applicants who took the ex- 
amination between the years 1947 and 1951. 

The present study shows that the men and 
women applicants seeking employment at this 
paper plant between 1947 and 1951 earned 
statistically significantly higher scores than 
did the group seeking employment prior to 
1947. The difference between the mean scores 
of both the men applicants and the women ap- 
plicants was, however, not striking, being 
about one-fourth of the 1947 S.D. The 1951 
mean is at the 55 percentile point on the 1947 
percentile norms. 

The direction of the change is upward to- 
wards applicants who perform in such a way 
that they earn higher test scores on the Re- 
vised Beta Examination. The reason for these 
changes lies beyond the scope of this study. 
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The Validity of Personality Inventories in the Selection of 
Employees 


Edwin E. Ghiselli and Richard P. Barthol 
University of California, Berkeley, California 


Industrial and governmental organizations 
have for many years utilized tests of various 
kinds as aids in the selection of employees. 
Certain types of tests, e.g., aptitude, profi- 
ciency, and intelligence tests, have been shown 
to have merit in improving selection tech- 
niques. In recent years personnel workers 
have become increasingly conscious of the im- 
portance of personality factors as contributors 
to employee satisfaction or unrest. Personal- 
ity tests and inventories have been used to 
supplement the subjective evaluation of these 
factors by the employment interviewers. A 
number of studies have been reported on the 
validity of personality inventories as selec- 
tion devices, but these have been widely scat- 
tered through the literature. The purpose of 
this report is to summarize these studies so 
that the usefulness of the personality inven- 
tory can be more easily assessed. 


Methods and Procedure 


In order to secure pertinent information we 
searched the various professional journals and 
books published from 1919 to date. From 
each study which reported findings concerning 
the validity of personality inventories for em- 
ployment purposes, we noted the validity co- 
efficient, the number of cases, and the job on 
which the group was employed. The studies 
included in the present analysis were re- 
stricted to those conducted in the United 
States, and to those in which the criterion was 
some index of job proficiency such as produc- 
tion records or ratings by superiors. An at- 
tempt was also made to include only those 
studies in which the scoring key for the per- 
sonality inventory was developed independ- 
enily of the group for which the validity co- 
efficient was reported. Approximately 40% 
of the material reported in this paper is un- 
published, having been drawn from various 
business, industrial, and governmental organ- 
izations. 


In selecting the data for this study we ex- 
amined the articles reporting the use of per- 
sonality inventories and excluded those report- 
ing traits that appeared to have little or no 
importance for the job in question. Thus, 
an inventory designed to measure sociability 
would be included for sales persons but not 
for machinists. We have grouped together all 
the remaining inventories regardless of the 
trait presumably measured. This was neces- 
sary because many utilize trait names that are 
very broad or not in general use, and some 
inventories do not name the trait at all. 


Results 


In order to show the general trends the 
weighted mean validity coefficient was com- 
puted through Fisher’s z for each of the major 
occupational groups. These values, together 
with the numbers of cases and numbers of 
validity coefficients, are given in Table 1. The 
distribution of the validity coefficients by oc- 
cupation are shown in Figure 1. 

There have apparently been few studies 
made on the efficacy of personality inventories 
for higher level supervisors. Contrary to ex- 
pectations, the mean validity coefficient of 
only .14 is low and the distribution is some- 


Table 1 


Weighted Mean Validity Coefficients of Personality 
Inventories for Various Occupational Groups 





Total Total 
No. of No. of 
Cases r’s 





Mean 
Occupation 





518 General Supervisors 
6433 Foremen 
1069 Clerks 
1120 Sales Clerks 
927 Salesmen 
536 Protective Workers 
385 Service Workers 
511 Trades and Crafts 
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groups. 


Distribution of validity coefficients of 
inventories for various occupational 


what scattered. There is one case of fewer 
than 50 subjects with a substantial coefficient 
of correlation. 

There were many studies reported for fore- 
men which support the conclusion that per- 
sonality inventories on the average do not 
have much predictive value in selecting super- 
visory employees. The mean and mode coin- 
cide at .18. Apparently certain inventories 
used under certain conditions give good pre- 
dictive results. 

The studies made on clerical workers in- 
dicate that reasonably good predictions can be 
made on the basis of personality inventories. 
The mean value of .25 and the group of co- 
efficients ranging from .50 to .65 demonstrate 
that this type of inventory can be seriously 
considered in devising a test battery for the 
selection of these workers. 

For both of the sales groups, sales clerks 
and salesmen, quite substantial validities have 
been found. While there have not been as 


many studies with these groups as might be 
expected the findings are fairly consistent. 
For both the mean validity coefficient is .36. 

We found only five studies in which scores 
on personality inventories were related to pro- 
ficiency among protective occupations. How- 
ever, all of these studies utilized sizeable num- 
bers of cases and are quite consistent in 
indicating moderate validity. The mean co- 
efficient is .24. 

In the studies of service workers the find- 
ings are quite inconsistent. Since the validity 
coefficients range from —.40 to +.50, the low 
mean validity coefficient for this occupational 
group cannot be considered a representative 
indication of the effectiveness of personality 
inventories. It appears that under certain 
circumstances inventories may be used effec- 
tively. 

The few applications of personality inven- 
tories to skilled workers have given quite 
promising results. The average of the valid- 
ity coefficients for the trades and crafts is 
.29. Furthermore, the findings from different 
studies are quite consistent. 


Discussion 


We were able to discover a total of 113 
studies dealing with the validity of personality 
inventories in employee selection. When one 
recalls that these studies are spread over a 
number of different occupations it is apparent 
that the amount of information available for 
the evaluation of inventories is by no means 
extensive. However, a similar survey of re- 
ports concerning the validity of intelligence 
tests, certainly a much more popular instru- 
ment in employee selection, revealed only 
some 450 studies. Thus while in ‘absolute 
terms the data may appear to be scanty, as 
compared to those available for other types 
of tests, they are fairly satisfactory. 

It has been found that under certain cir- 
cumstances scores on personality inventories 
correlate better with proficiency on a wider 
variety of jobs than might be expected. On 
the other hand there have been enough studies 
reporting negative results to emphasize cau- 
tion in their use. These inventories have 
proved to be effective for some occupations in 
which personality factors would appear to be 
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of minimal importance (e.g., clerks, and trades 
and crafts), and ineffective for other occupa- 
tions in which these factors could reasonably 


be 


expected to be of paramount importance 


(e.g., supervisors and foremen). 
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Over and Under Achievement in a Sales School in Relation 
to Future Production 


Marion A. Bills and Jean G. Taylor 
Etna Life Affiliated Companies, Hartford, Conn. 


Beginning January 1, 1947, and based on 
previous experimental data, the Life Agency 
Department of the A°tna Life Affiliated Com- 
panies decided that it would require all ap- 
plicants for selling positions to take three 
tests. These were: (1) Strong’s Vocational 
Interest Blank; (2) the Aptitude Index pub- 
lished by the Life Insurance Agency Manage- 
ment Association (a scoring of an application 
blank and a personality test); and (3) LOMA 
Test 1—A, a mental alertness test published 
by the Life Office Management Association. 

The Life Agency Department has regularly 
conducted schools for the training of agents. 
These schools have had various purposes and 
entrance requirements, but one series of 


schools conducted between January 1, 1947 
and October 1, 1949, known as the “basic 
schools,” were primarily for new agents and 


no previous selling experience or production 
was required for admittance. It has been 
noted from the first that there was a definite 
relationship between the LOMA 1-A test 
scores and the grades earned in the schools 
but that this relationship was by no means 
perfect. In addition, those who did better 
in the school than their test scores would in- 
dicate, seemed to be more successful in future 
selling. However, this result was not studied 
statistically until this year largely because the 
information came too late in our selection pro- 
cedure to be of material ‘benefit. However, 
since the information has at least academic in- 
terest, might prove valuable in certain border- 
line cases, and since a large enough number 
of cases have been accumulated to justify a 
statistical study, we feel that it is advantage- 
ous to report certain of our results. 

To keep the group as homogeneous as pos- 
sible except for the two variables being 
studied, LOMA 1-A test scores and school 
grade, we limited the group to those who had 
scored an “A” on the Life Insurance scale of 
Strong’s Vocational Interest Blank and an “A” 


on the Aptitude Index and had attended a 
“basic school.” There were 91 agents who 
met these requirements. 

The grades in the “basic school” for these 
91 agents ranged from 80 to 98 (S.D. = 3.96) 
with a mean of 90. LOMA 1-A test scores 
ranged from 99 to 209 (S.D. = 20.94) with a 
mean of 146. The correlation (product mo- 
ment) between LOMA 1-—A test scores and 
school grades was .64, between discrepancies 
and grade was .77, and between discrepancies 
and LOMA 1-A test scores was —.01. From 
the regression equation a predicted school 
grade was derived for each LOMA 1-A test 
score. This predicted score was then com- 
pared to the actual grade received to give an 
“index-of-achievement” score for each of the 
91 individuals (actual grades minus predicted 
grades). This “index-of-achievement” score 
ranged from +7.8 to —6.6 and had a mean 
of O and a standard deviation of 3.03. Those 
receiving positive scores were considered 
“over” achievers, and those receiving negative 
scores “under” achievers. 

The achievement scores were divided into 
three groups with extreme “over” achievers 
falling +3.0 and over, and extreme “under” 
achievers —3.0 and under. Table 1 gives the 
results of a comparison between the achieve- 
ment scores and a combined criterion of length 
of service and premium production during the 
first year. 

A Chi Square test for Table 1 yields a value 
of 15.44 which, with four degrees of freedom, 
is significant at the .01 level. It is evident 
that extreme “over” achievers, those with a 
score of +3.00 or over, in contrast to extreme 
“under” achievers, those with scores of —3.00 
or less, tend more frequently to remain at 
least a year and to be higher producers. The 
one representative who was made an Agency 
Assistant before the end of the first year was 
an extreme “over” achiever and fell in the 
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Table 1 


Over and Under Achievement in the Sales School versus Length of Service and 
Premium Production in the First Year 








Per Cent of Agents PerCentofAgents Per Cent of Agents 
Who Terminated Who Remained Who Remained 
Prior to 1 Year 1 Year and 1 Year and 
or Produced Less Produced Between Produced 
Than $2500 or Both $2500-$4999 $5000 or Over 


ver ar % 39% 


Index of 
Achievement 
(Actual School Grade 

Minus Predicted 
School Grade) 








+3.00 and over 39% 
+2.99 to —2.99 37 12 
— 3.00 and under 13 6 





Table 2 


Over and Under Achievement in the Sales School versus Length of Service and 
Total Two-Year Production 








Index of 
Achievement 
(Actual School Grade 
Minus Predicted 
School Grade) N 


Per Cent of Agents 
Who Terminated 
Before the End of 

2 Years or Produced 


Per Cent of Agents 
Who Remained 
2 Years and 
Produced 


Per Cent of Agents 
Who Remained 2 

Years and Produced 
$10,000 or Over or 

Less Than $5000 $5000-$9999 Became Agency Asst’s* 


+3,00 and over 18 22% 33% 45% 
+2.99 to —2.99 57 55 30 16 
— 3.00 and under 16 75 19 6 








* Persons charged with agency management responsibilities and not engaged primarily in the sale of Life 


Insurance for their own accounts. 


highest production group even with only 
eleven months of production represented. 

In addition to success in the first year as 
treated in Table 1 we were also interested in 
following the same line of approach with total 
production during two years. Using the same 
breakdown of achievement scores but with a 
different division of the combined criterion of 
length of service and premium production, 
Table 2 was constructed. 


A Chi Square based on Table 2 is 14.00 
which, with four degrees of freedom, is signifi- 
cant at the .01 level. Table 2 indicates that 
the same general results persist over a two- 
year period. 

In the above discussion, +3.00 and —3.00 
were chosen as points where we could be rea- 
sonably sure that no chance variation in the 
school grading would account for the differ- 
ence. However, it is of interest to note that 


Table 3 


First Year Production or Total 
Production if Left Under 
Twelve Months 
$2500-$4999 


Relation 
of Actual 
School Grade 
to Predicted N N % 


Over 49 16 
Under 42 


Under $2500 


299 


Over and Under Achievement in the Sales School versus First Year Production 


33% 19 


No. and Per Cent of 
Agents Who 
Remained 12 

Months or More 
(Includes 1 Made 
Agency Assistant) 


% 


39% 
29 








Over and Under Achievement in a Sales School 


Table 4 





Total Two-Year Production or Total 
Production if Left Prior to 
End of Two Years 





Relation 
of Actual 
School Grade 
N 





3 


Under 


we obtain the same general results if the break 
between “over” and “under” achievers is made 
at zero although we find, as would be expected, 
that the differences are not as pronounced. 
These results are given in Tables 3 and 4. 


Summary 


The results reported in this study indicate 
that agents who receive a score of “A” on the 
Life Insurance Scale of Strong’s Vocational 
Interest Blank, a score of “A” on the Aptitude 


Over and Under Achievement in the Sales School versus Total Two Years’ Production 


No. and Per Cent of 
Those Contracted 
Who Remained at 
Least Two Years 
(Includes Agency 

Assistants) 


Made 
Agency 
Assistant 

(as of 
July, 1952) 


11 
0 


$10,000 
and Over 


N % 


~~ 16 “3 


Oo 
/0 


67% 
38 


Index, and who achieve a higher grade in the 
“basic school” than would be predicted from 
their LOMA score: (1) remained with the 
company longer; (2) produced more paid 
premiums; and (3) were promoted to super- 
visory positions oftener than the agents who 
did not achieve a “basic school” grade as high 
as their LOMA test score would predict. 


Received September 11, 1952. 
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A Personality Study of Professional and Student Actors * 


Chalmers L. Stacey 


Syracuse University 


For a long time it has been the contention 
of the authors that a great deal of frustration 
and heartbreak could be avoided if some meas- 
ures of ability to attain success could be estab- 
lished for young people who wish to act in the 
professional field. Year after year thousands 
of young hopefuls flood the offices of the 
theatrical agents of Broadway and Hollywood 
determined to make a name for themselves. 
In many cases their decisions have been based 
on the fact that someone said that they were 
the possessors of pretty or handsome faces. 

At the present time there is no standard for 
measuring or predicting the success a young, 
would-be actor hopes to attain. However the 
purpose of this experiment is not to establish 
such an over-all criterion but rather to deter- 
mine some descriptive elements of the person- 
alities of the groups studied. 

Personality was selected as the basis of 
measurement for more than the reason of ex- 
pediency. It was felt that personality as well 
as talent was one of the basic factors for at- 
taining a certain amount of success in the field 
of acting. Discussion with the Broadway ac- 
tors who were used as subjects in the study 
seemed to verify this fact. 

It is hoped that the significant knowledge 
contained in the following material will be 
used to understand further the bewildering 
position of the young people who have chosen 
for themselves the difficult art of acting. 


Problem 


The present experiment was designed in 
order to answer the following questions: 

A. Do students in the School of Speech and 
Dramatic Art, Syracuse University, who ex- 
press the desire to become professional actors 
and who appear in the major productions at 

* The authors express their thanks for the assistance 
of Mr. Robert Breen of the American National 
Theatre and Academy; Mr. Clarence Derwent, Presi- 
dent of Actors Equity; Professor Sawyer Falk, Di- 
rector of Dramatic Activities at Syracuse University ; 


and Mr. S. Eugene Perlman, assistant at Hofstra 
College. 


and 
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the Civic University Theatre, have a pattern 
of personality traits similar to that of profes- 
sional actors? 

B. Do students in the School of Speech and 
Dramatic Art, Syracuse University, who ex- 
press the desire to become professional actors 
but for various reasons do not appear in the 
major productions at the Civic University 
Theatre have a pattern of personality traits 
similar to professional actors? 

C. Do students in the School of Speech and 
Dramatic Art, Syracuse University, who ex- 
press the desire to become professional actors 
and who appear in the major productions at 
the Civic University Theatre have a pattern 
of personality traits closer to that of profes- 
sional actors than do students in the School 
of Speech and Dramatic Art, Syracuse Uni- 
versity, who express the desire to become pro- 
fessional actors but for various reasons do not 
appear in the major productions at the Civic 
University Theatre? 


The Experimental Situation 


Subjects. The following three groups were 
tested: (a) A total of 74 professional actors 
with a minimum of five years professional 
experience; (b) 30 students of the School of 
Speech and Dramatic Art who appeared in the 
University productions; and (c) 100 students 
of the School of Speech and Dramatic Art who 
did not appear in University productions. 

The Work of the Experimenter. The ex- 
perimenter presented the questionnaires to the 
subjects; insured no communication between 
subjects once the examination period began; 
and answered only questions concerning spe- 
cific items. 

The Material. Two personality question- 
naires were used during the course of this ex- 
periment. These were J. P. Guilford’s An 
Inventory of Factors STDCR which tested 
the factors: (S) Social Introversion-extraver- 
sion, (T) Thinking Introversion-extraversion, 
(D) Depression, (C) Cycloid Disposition, 
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Table 1 


Showing the Variances for: (1) Professional Actors, 
(2) Student Actors Who Appeared in University Pro- 
ductions, and (3) Student Actors Who Did Not Appear 
in University Productions on Factors S T D C R 
and O Ag Co of the Inventories 


Group I 


79.2 

84.3 
141.4 
136.6 
132.7 149.5 
184.2 184.2 
108.0 75.7 
190.2 226.7 


Group II Group III 


64.7 
138.7 
136.9 
136.9 


76.8 
100.9 
125.6 
134.7 

96.4 
200.0 
120.0 
345.5 


Table 2 


Showing “F” and “t” Values for the Differences of 
Variances and Means Among the 
Three Groups 


Groups I 
and II 


Groups I 
and III 


Groups IT 
and III 





_— = —. US ‘F oe 
1.1 2.8** 1.1 yo ‘ 9 
A lee sy" F 1.1 
in 26 3.4** ; a 
1.0 1.3 ; 1.1 ol 

~ 1.0 1.6 >” é 1.5 
1.0 1.6 a 
1.4 A ‘ S 
1.2 5 i 1.3 


* Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 


(R) Rhathymia; and The Guilford-Martin 
Personnel Inventory which tested the fac- 
tors: (O) Objectivity, (Ag) Agreeableness, 
(Co) Cooperativeness. 

The Data and Their Analyses. For the 
purposes of this study two statistical tests 
were used to analyze the data: the F test to 
test the differences in variances and the t test 
to test the differences in means. | Tables 1 
and 2 present these findings. 


Conclusions 


The following conclusions were arrived at 
after an analysis of the data: 
On the factors, Cycloid Disposition, Objec- 


tivity, Agreeableness, Cooperativeness, there 
was no difference in degree of personality trait 
between the professional actors and student 
actors who did not appear in university pro- 
ductions. 

Professional actors, however, are signifi- 
cantly more shy, seclusive, and have a greater ° 
tendency to withdraw from social contacts 
than do student actors who do not appear in 
university productions (Factor S). 

It may also be said that student actors who 
do not appear in university productions have 
significantly more of the tendencies to seek 
social contacts and enjoy the company of 
others (Factor S). 

Professional actors are significantly more 
inclined to meditative thinking, philosophiz- 
ing, analyzing one’s self and others than are 
student actors who do not appear in university 
productions. These student actors tend sig- 
nificantly toward an extravertive orientation 
of the thinking processes (Factor T). 

Professional actors show significantly more 
signs of depression than do students who do 
not appear in university productions. Fur- 
thermore, student actors who do not appear 
in university productions are significantly 
more cheerful and optimistic than professional 
actors (Factor D). 

Professional actors are significantly more 
inhibited, over-controlled, conscientious and 
serious-minded than student actors who do not 
appear in university productions (Factor R). 

Student actors who do appear in university 
productions and student actors who do not 
exhibit about the same degree of personality 
in all eight traits measured. 

Professional actors and student actors who 
appear in university productions exhibit about 
the same degree of personality in traits of 
Cycloid Disposition, Rhathymia, Objectivity, 
Agreeableness and Cooperativeness. 

The student actors who did appear in uni- 
versity productions were like professional ac- 
tors on six and significantly different on only 
two of the eight traits measured. However, 
student actors who did not appear in univer- 
sity productions were like professional actors 
on four and significantly different on four of 
the eight traits measured. 


Received March 10, 1952. 
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Factors Influencing Reliability and Validity of Leaderless 
Group Discussion Assessment * 


Bernard M. Bass, Stanley Klubeck and Cecil R. Wurster 


Louisiana State University 


A substantial body of evidence is available 
to suggest that behavior in the initially leader- 
less group discussion is indicative of leadership 
potential for a fairly wide range of situations 
(1, 2, 4, 5, 6, 8, 10, 11, 12, 13, 14). However, 
the leaderless discussion technique has one 
obvious handicap, at least. Since very few 
candidates can be placed in a single discussion, 
many discussions are likely to contain unrep- 
resentative samples of the population being as- 
sessed. Some discussions may contain only 
persons high in leadership potential; others 
may contain only persons low in such poten- 
tial. 

Since it has been shown that a person’s 
leaderless discussion rating will be lower, the 
more candidates he must compete with in a 
given discussion (3), it seems reasonable to 
hypothesize that a candidate’s ratings will de- 
pend also to some extent on the “quality” as 
well as the quantity of those he is facing. The 

‘leadership potential and the effectiveness of 
discussion behavior among his fellow candi- 
dates will probably affect a candidate’s rating 
even where raters attempt to use one standard 
for rating many discussions. 

If these two sources of assessment error— 
variations from discussion to discussion in 
quantity and “quality”—were unrelated to 
variations in the reliability and validity of the 
LGD, they would cease to be a problem. On 
the other hand, if variations in the avail- 
able leadership potential, in number of par- 
ticipants, and in effectiveness of discussion 
behavior of participants as a whole, were sys- 
tematically related to the reliability and valid- 
ity of the technique, then it would be profit- 
able to isolate these errors and make appro- 
priate allowances for them in the future. 
Also, the more reliability and validity of the 
technique were found to vary from discussion 


1 This study was aided by a grant from the Louisi- 
ana State University Council on Research. 


to discussion, the more would these allow- 
ances be necessary. The purpose of this in- 
vestigation was to determine the extent to 
which the reliability and validity of the LGD 
varied from discussion to discussion and the 
extent to which these variations could be ac- 
counted for by other known LGD variables 
such as the quantity and “quality” of par- 
ticipants. 


Method 


The investigators had available the assess- 
ment and criterion data from LGD validation 
studies by Wurster and Bass (14) based on 
14 discussions among fraternity pledges; by 
Doll (6) based on 20 discussions among 
sorority members and by Bass and Coates 
(2) based on 21 discussions among Army 
cadets and 12 discussions among Air Force 
cadets. 

For each of the 67 discussions the following 
indices were computed: r.q = the validity of 
LGD observers’ ratings as indicated by their 
correlation with a criterion of peers’ or supe- 
riors’ appraisals of the participants’ leader- 
ship potential; ? r,, =the reliability of the 
two LGD observers’ ratings as estimated by 
the correlation between them; M, and SD, = 
the mean and standard deviation of assigned 
LGD ratings; M,. and SD, = the mean and 
standard deviation of participants’ criterion 
ratings; K = the number of participants in a 
designated discussion; and E = the extent to 
which a designated discussion attained its ob- 
jectives as rated by both observers on a five- 
point scale. The corrected split-half reliabil- 
ities of both observers’ ratings of this meas- 
ure of group effectiveness for the 4 studies 
respectively were .74, .75, .78 and .85. 

2 For a discussion of how these ratings were made 


the reader is referred to the original studies (2, 6, 
14). 
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Table 1 


Subjects No. Groups 


Fraternity pledges (14) 14 
Sorority members (6) 20 
Army cadets (2) 21 
Air Force cadets (2) 12 
Weighted Average* 


Fraternity pledges (14) 14 

Sorority members (6) 20 

Army cadets (2) 21 , 
Air Force cadets (2) 12 39 
Weighted Average* 43 


* Used 7-item rather than 9-item rating scale. 


Variations in Reliability and Validity 

Table 1 shows the means and standard de- 
viations of the above measures study-by- 
study. Of special interest to the investigators 
were the following conclusions inferred from 
the results reported in Table 1. 

1. The reliability of LGD observers’ rat- 
ings appeared quite stable as judged from the 
standard deviation of these reliabilities of .12. 
Judging from the average mean reliability of 
.84, it appears that this measure suffered little 
when obtained discussion-by-discussion — in 
comparison to previous studies where it was 
obtained from pooled data. 

2. The average validity of the LGD of .27 
was less than obtained for the same data when 
analysis was performed in previous studies by 
pooling many discussions. This suggested 
that the validity of the LGD would be lowered 
when use was made of any rating technique 
such as comparison rankings among discussion 
participants that did not make provision for 
between-discussion variations in the candi- 
dates. 

3. The average standard deviation of the 
validity coefficients of .43 indicated that there 
was tremendous variability in the validities 
from discussion to discussion—almost as much 
variation as the correlation scale of +1.00 to 
—1.00 would allow. Therefore, it was de- 
duced that any factors which could be found 


Number of Groups, Means and Standard Deviations of rea, rxx’, Ma, SDa, Me, SDo, E, and K 


Means 


a a” aoe 


14.9 40.7 5 
11.0* 
17.1 
17.2 


30.2 
19.2 
39.3 


Standard Deviations 


09 7.0 3.8 6.9 3.0 
10 3.4* 2.6% 4.3 2.5 
5 4.8 3.1 25 1.7 
12 6.3 3.6 11.4 6.0 
12 - — 


* Only computed for variables whose scales remained the same for all four samples of subjects. 


to correlate with the validity coefficient, even 
if the correlations were quite low, could ac- 
count for a wide variation in validity. 


Correlational Analysis 


Pearson product-moment intercorrelations 
were computed .between the various group 
measures. Each intercorrelation was com- 
puted separately for each of the 4 samples be- 
cause of the variations in rating scales used 
from study to study. The correlations were 
transformed by means of Fisher’s Z conver- 
sion and then averaged. 

Several comments concerning this matrix 
shown in Table 2 appear pertinent: 

1. The significantly negative correlafion of 
—.37 between group size and mean discussion 
ratings assigned corroborated similar results 
obtained by Bass and Norton (3), who varied 
group size systematically from 2 to 12. Thus, 
even where size differences were small and ac- 
cidental as in the present analysis, substantial 
variations in LGD leadership ratings assigned 
were found associated with variations in the 
number of participants per discussion. 

2. The significant correlation of .46 be- 
tween rated discussion effectiveness and mean 
LGD ratings assigned group-by-group sug- 
gested that the discussion observers had a con- 
sistent frame of reference in making these two 
ratings which transferred from one group to 








28 Bernard M. Bass, Stanley Klubeck and Cecil R. Wurster 


another, since, by definition, leadership ratings 
of individuals were supposed to depend on the 
degree to which they moved their group to- 
wards its goal while group effectiveness was 
defined as degree of goal attainment. 

3. Possibly the most valuable finding of this 
analysis was the significant correlation of .35 
between the mean discussion rating assigned 
and the mean criterion status of discussion 
participants. This suggested strongly that 
absolute ratings of the discussion observers 
were accurately sensitive to group variations 
in outside leadership potential. It implied 
that there was a “between groups” positive 
correlation was well as a “within groups” posi- 
tive correlation between LGD ratings and 
outside appraisals of leadership potential. It 
was the validity due to “between groups” 
covariance which was lost when correlational 
analyses between test and criterion were run 
group-by-group rather than for an entire sam- 
ple. This probably accounted for the low 
mean validity of .27 based on group-by-group 
analyses reported in Table 1 in contrast to the 
validities of .40 and .5O reported when data 
are pooled. 

4. The positive but insignificant correlation 
of .20 between the group-by-group standard 
deviations of discussion ratings and the group- 
by-group standard deviations of criterion rat- 
ings suggested that the observer’s ratings were 
also somewhat sensitive to the variation in re- 
strictions in range of the outside leadership 
potential among the participants. Once again, 
it is obvious that ratings which depend solely 
on standards within a group are most likely to 
suffer in validity. Rating techniques with this 
disadvantage include the forced distribution 
method, the paired comparison technique or 
the rank order-of-merit procedure where 
quotas, pairings or rankings are made within 
a designated group situation. Similarly, any 
ratings of each other by the candidates them- 
selves will most likely be attenuated in valid- 
ity since they will depend solely upon stand- 
ards based on observation of a single discus- 
sion. 

5. Despite the above relationships, while 
the means and standard deviations of criterion 
ratings were significantly positively related 
(r = .42), the means and standard deviations 
of discussion ratings were significantly nega- 


tively related (r= —.28). This suggested 
that average and poor discussion participants 
were handicapped most severely when in com- 
petition with those participants of the entire 
sample who earned extremely high LGD 
ratings. 

6. The reliability or extent of agreement be- 
tween the two discussion observers appeared 
significantly related (r = .54) with the stand- 
ard deviation of the discussion ratings. How- 
ever, this correlation was an artifactual rela- 
lationship, since by a simple transposition of 
the formula for the standard deviation of the 
sum of correlated scores it can be shown that 
SD? — SD — SD,” 


2SD,SDx, 
refer to each of the 2 observers’ ratings and 
where x + x’ = d. 

7. A significantly negative correlation of 
—.32 was found between mean discussion rat- 
ings assigned and the reliability of ratings. A 
possible explanation for this correlation of- 
fered by the observers was that they, the ob- 
servers, became more interested and absorbed 
in discussions with very good participants 
while remaining more detached when partici- 
pants were poorer. This same hypothesis was 
used to account for the one highly significant 
curvilinear relationship which was found to 
exist among the variables. For each of the 
four samples respectively, etas of .54, .78, .73 
and .68 were found between the rated effec- 
tiveness of the group discussion and the extent 
to which the observers agreed on the leader- 


where x and x’ 


Txx/ = 


Table 2 


Mean Intercorrelations Among fea, xx’, Ma, SDa, Me, 


fod Tax’ Ma SDa M.e SD. E 


So Rm BAB LD. 
54-09 03 . 

—28 35 —.03 . 

%. m. 

42. 





* With 65 d.f., p < .05 when r = .24; p < .01 when 
r = .31. All correlations significant at and below the 
5 per cent level of confidence are in boldface type. 
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ship ratings they assigned. Agreement among 
observers reached a maximum in groups of 
average effectiveness or goal attainment while 
reliability of discussion ratings was low in 
both extremely effective and extremely ineffec- 
tive groups. 

8. The validity of the LGD was correlated 
at the 5 per cent level of confidence with three 
variables, the group-by-group means of crite- 
rion ratings, as well as the group-by-group 
standard deviations of discussion and criterion 
ratings. Since these three variables were all 
related to each other, it was difficult at this 
point to determine which ones were uniquely 
related with LGD validity. 


Multiple Correlational Analyses 


It was decided to isolate the unique contri- 
bution to the variance of the validity of each 
of the other 7 variables of this investigation. 
This was done by determining the multiple 
correlation between the validity of the LGD 
and an optimally weighted sum of scores de- 
rived from the other 7 variables. The Doo- 
little rather than the Wherry-Doolittle solu- 
tion was used to obtain the multiple R and the 
beta weights since interest of the investigators 
was focused on studying the effects of all the 
other variables on validity rather than the ef- 
fects of the smallest number of the other vari- 
ables which would yield the highest multiple 
correlation with LGD validity. The multiple 
correlation obtained was .43 indicating that 
approximately 19 per cent of the variance in 
LGD validity from group-to-group could be 
accounted for by those other variables. Of 
this 19 per cent, 6 per cent was accounted for 
by the standard deviation of criterion ratings; 
5 per cent, by the standard deviation of LGD 
ratings; 4 per cent, by mean criterion scores; 
3 per cent, by group size; and 1 per cent, by 
group effectiveness. These results suggest 
that: 

1. The validity of a given discussion clearly 
suffered when there was a restriction in range 
of criterion ratings. This particular handicap 
will probably always remain with any group 
situational test where a candidate’s ratings, at 
least to some extent, depend upon the particu- 
lar combination of candidates with whom he 
happens to be grouped for assessment. 


The effect on validity of criterion variations 
from discussion to discussion also suggests 
that increased effort must be directed toward 
training the observers to develop a standard 
frame of reference which transcends any 
given group discussion. 

2. The LGD may be expected to demon- 
strate greater discriminability among those 
higher on criteria of leadership potential than 
among those lower on such external criteria. 

3. LGD validity may be raised by increas- 
ing the standard deviations of discussion rat- 
ings. Aside from further training of the raters 
and more emphasis on forcing the raters to 
make greater discriminations, a number of 
ways may be suggested to increase the validity 
of the LGD. 

a. The length of discussion time may be 
lengthened from 30 minutes to an hour or 
more with the expectation that greater strati- 
fication in status may occur—although no evi- 
dence is available to support this contention. 

b. All the candidates can be coached briefly 
on how to be successful discussion leaders. 
Klubeck and Bass (9) have shown that while 
brief coaching raises significantly the LGD 
ratings of participants who are fairly success- 
ful initially without such training, such train- 
ing does not alter the LGD ratings of those 
who have been found initially unable to 
emerge as discussion leaders. This would sug- 
gest that briefly coaching all participants 
would lead to a greater dispersion in the LGD 
behavior, although, of course, a long period of 
training might be expected to do otherwise. 

4. Size appeared negatively related to valid- 
ity. However, the relationship was too low to 
be anything but suggestive. Although groups 
varied only from 6 to 8 in size in these analy- 
ses, these variations accounted for 3 per cent 
of the variance. A study may be warranted 
of the relation between group size and validity 
similar to Bass and Norton’s (4) analyses of 
the relation between size and reliability. 

Similar Doolittle analyses were made to see 
the extent to which each of the other 7 vari- 
ables contributed to the variance of the reli- 
ability of discussion ratings from group to 
group and the extent each of the other 7 vari- 
ables contributed to the variations in efficiency 
from group to group. The two obtained mul- 
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tiple correlations were .59 and .56 respec- 
tively; however little further knowledge was 
added to understanding of the relationships 
among the variables than had been found by 
inspection of the correlation matrix. 


Summary 


In order to determine the extent to which 
it was possible to account for variations in the 
reliability and validity of the leaderless group 
discussion, the means and standard deviations 
were computed for 8 variables along which 
LGD’s vary. Also, a mean intercorrelation 
matrix was computed among these 8 variables. 

The most important findings of this and re- 
lated analyses were that: 

1. The validity of the individual LGD 
varied greatly from discussion to discussion 
while the reliability of LGD ratings appeared 
quite stable. 

2. Absolute ratings of leadership perform- 
ance by LGD observers appeared accurately 
sensitive to variations from discussion to dis- 
cussion in the outside leadership status of the 
participants. 

3. Discussion observers’ ratings agreed most 
closely when discussions were average in ef- 
fectiveness rather than extremely effective or 
ineffective. 

4. The validity of a given LGD was higher, 
the higher the outside leadership status of the 
participants in the discussion, the more strati- 
fied this status, and the more diverse the LGD 
ratings the observers were able to assign. 

These results suggested a number of ways 
in which it might be possible to raise the 
validity of the LGD for assessing leadership 
potential. 


Received March 3, 1952. 
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Validity of the Strong Vocational Interest Blank Nursing 
| Key 


Leslie Navran 


San Francisco State College 


In a study done in 1947 (3), the Strong 
Vocational Interest Blank was administered to 
two groups of girls who were entering nursing 
training at Stanford University and San Jose 
State College. The Stanford group (N = 26) 
had a mean score of 43.8 points on the Strong 
nursing scale. The mean score of the San Jose 
State group (N = 44) was 40.1. 

In 1949, a follow-up revealed that 59 of the 
70 girls had completed the first two years of 
the three-year program. The mean nursing 
scale score of this surviving group was 42.0. 
According to the manual for the Strong 
Blank (5), a score of 41 is the dividing line 
between the B and B-plus letter grades. 
Thus, the girls entering the last year of train- 
ing (who were considered by school officials 
as being almost certain to graduate) ' had an 
average nursing scale score equal to only the 
16th percentile of the standardization group. 
Twenty-six of them had scores in the B, B- 
minus, and C range. These results have im- 
portant implications with respect to the valid- 
ity of the nursing key and the use of the key 
in vocational counseling. 

Previous reports have indicated that in the 
past the nursing key has been more useful. 
In 1939, Hilgard (1) found that “those with 
ratings on the Strong below ‘A’ in nursing 
showed little likelihood of completing the 
nurses’ training course.” Moreover, all the 
girls in her sample who scored below A had 
dropped out of training by the end of the first 
year. Roper (4) reported an average nursing 
scale score of 57.1 (in the A range) for 33 
high school senior girls who were interested 
in nursing. 

It is true, of course, that Strong’s test does 
not purport to measure success in getting 
through school. Rather, it purports to meas- 
ure the interests of women who have con- 
tinued in nursing for a considerable period of 


1 The writer has been informed that 24 of the 26 
girls at Stanford completed their training successfully. 
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time after completion of training. Neverthe- 
less, the Strong test Aas been used to counsel 
students prior to their entrance into training, 
and the contrast between present-day and past 
results with the nursing key indicates that 
vocational counselors should now interpret B 
and C scores on the nursing key with caution 
because the predictive value of such scores 
may have lessened materially. 

In view of the small size of the samples used 
by Navran (3), it may be rash to state flatly 
that a revision of the nursing scale is needed. 
However, the following considerations lend 
support to the possibility that such is the case: 

It has been necessary in recent years to re- 
vise some of the scales on the men’s form of 
the Strong (2, 6), and where marked changes 
in the scale have resulted, they have been at- 
tributed to developments in the occupation it- 
self which made for changes in the composi- 
tion of the people engaged in the occupation. 
There is evidence that this may also be true 
of nursing. For one thing, partly as a func- 
tion of World War II and the current Korean 
conflict, there has been a serious shortage of 
nurses. The effect of this has been to recruit 
heavily for the profession, and this may be 
bringing girls into nursing who differ from 
the standardization group, but who nonethe- 
less can and will become nurses. This is an- 
other way of saying that nursing may be draw- 
ing from a wider segment of the general popu- 
lation in terms of measured interests than was 
formerly the case. 

Related to this is the discrepancy in age 
between nursing trainees and the standardi- 
zation group. Inspection of the Strong man- 
ual (5) reveals the standardization group to 
have been 34 years old, on the average, when 
tested in 1942. This means that girls pres- 
ently graduating from high school and enter- 
ing nursing training are approximately 15 
years younger than the standardization group 
with whom they are being compared. This 
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age difference may also be a factor making 
for different likes and dislikes in the present- 
day nursing trainees. 

Finally, and perhaps most importantly, 
nursing itself has become more complex and 
proliferated. There is an increasing differen- 
tiation being made between the practical nurse 
and the professional nurse. Also, specializa- 
tion in psychiatric nursing is growing more 
common, perhaps as a function of the growth 
and development of psychiatry and clinical 
psychology. It should be noted, too, that “the 
only revised scales which have differed ap- 
preciably from the old scales are the physician 
and psychologist scales.” * Since these pro- 
fessional people with whom nurses are in close 
association have changed so greatly, it may be 
reasonable to hypothesize that nurse trainees 
who can get along well with them may also be 
quite different from their older and successful 
fellow-nurses. This is speculation, of course, 
but in view of the results reported above, it 
makes the adducing of more data ‘extremely 
pertinent. 


2 Personal communication from the consulting edi- 
tors of this journal. 


Summary 


Evidence is presented which casts doubt on 
the validity of the present nursing key of the 
Strong Vocational Interest Blank. Factors 
which may account for this finding are dis- 
cussed. 


Received April 28, 1952. 
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Individual Differences in Ability to Fake Vocational Interests 


Ralph Garry 


Boston Universit y 


The purpose of this study was to investigate 
individual differences in ability to fake voca- 
tional interests, to determine if reliable in- 
dividual differences in faking ability existed, 
and if such differences could be related to 
vocational selection. Three separate trials 
were made, requesting college students, who 
had previously been administered the Strong 
Vocational Interest Blank under standard 
directions, to obtain as high a score as possible 
on certain of the occupational interest scales. 
Derived faking scores for the several scales 


were correlated to determine generality of, 


ability to fake, and also the reliability of such 
faking. 


Background 


The present study was initially begun in 
conjunction with the Medical Specialists Re- 
search Project, a joint undertaking of Stan- 
ford University and the Surgeon General of 
the U. S. Army, having for its purpose the de- 
velopment of test instruments designed to 
facilitate assignment and classification of doc- 
tors into residency training programs.’ It 
was hoped that if reliable individual differ- 
ences existed, they could be used in distin- 
guishing candidates specializing in psychiatry 
from those in surgery, the assumption being 
that psychiatrists would show greater insight 
into attitudes and interests. 

Although several attempts have been made 
to devise measures of “social intelligence,” 
these so-called tests of “social intelligence” 
and “social judgment” have been little better 
than crude measures of intelligence (2, 6, 8). 
On the other hand, the results of several 
studies have suggested that the ability to fake 
scores in predetermined ways on preference- 
type tests showed promise as a possible meas- 
ure in the area of social intelligence or psy- 
chological insight (1, 4). 


1 The author is indebted to Lloyd G. Humphreys 
under whose supervision the present study. was 
executed. 


Strong (7, p. 685) reports that “testees can 
deliberately obtain high occupational interest 
scores when they try.” Benton and Korn- 
hauser (1), interested in the use of the Strong 
Interest Blank in selection of medical school 
students, asked a group of 34 undergraduate 
college students (mainly social science majors) 
to fake as high a score on the physician scale 
as possible. The results corroborate Strong’s 
finding that faking is possible. Of greater in- 
terest was an indication that all of the group 
did not gain, giving support to the premise 
that the ability to fake occurs in differing 
degrees. 

Of the few earlier studies, one of the most 
relevant to the present was Steinmetz’s (5). 
A total of 46 junior college students, directed 
to fake high scores on teacher-administrator 
scales on the Strong Blank, made significant 
gains over original scores. Intercorrelation 
between original score, faked score, intelli- 
gence and gains made in faking showed that 
both original and faked scores correlated with 
intelligence significantly greater than .00, but 
the difference between them was not statis- 
tically significant. The correlation between 
gain made and intelligence was significantly 
negative. Steinmetz infers from this negative 
correlation that intelligence makes little con- 
tribution to the obtained faking, apparently 
overlooking the extent to which the negative r 
is an artifact of method of determining gain; 
individuals with low initial scores have much 
greater possibilities for gains. 

The extent of the relationship between in- 
telligence and the ability to fake scores is 
critically important to a conclusion that fak- 
ing ability represents “social judgment.” An 
attempt to obtain a partial correlation coeffi- 
cient between these two variables holding the 
relationship of each with initial score constant 
produced coefficients in excess of 1.00, suggest- 
ing inaccuracies in the data as presented. 

In a recent study Jessen (3) found that 
parents’ responses on answering the Kuder 
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Preference Record and Bell Adjustment In- 
ventory as they thought their children would 
respond correlated .75 with child responses. 
These findings give rise to the question of how 
far such faking ability extends; that is, will a 
more general population show as high a de- 
gree of faking or is such faking ability limited 
to particular situations? 

The purpose of this study, therefore, is to 
determine the degree, extent and reliability of 
the ability to fake scores on the Strong Voca- 
tional Interest Blank for Men. 


Population 


A separate group was used on each of three 
trials. Group 1 consisted of 178 male, college 
undergraduates enrolled in a general psychol- 
ogy course. Groups 2 and 3 included 75 and 
91 students of both sexes enrolled in educa- 
tional psychology courses. The latter two 
groups were more heterogeneous with respect 
to age and vocational experience. For pur- 
poses of the tables in this report only the data 
for Groups 2 and 3 are presented. 


Procedure 


1. The Strong Vocational Interest Blank for 
Men was administered using standard pro- 
cedure to a group of sufficient size to provide 
sub-groups (successful and unsuccessful at 
faking) of fair size. 

2. Biographical data were obtained for 
each subject using a multiple choice ques- 
tionnaire. The majority of the questions 
asked for responses regarding education, 
work experience, hobbies, career choice and 
father’s occupation. The format of the ques- 
tionnaire permitted the classification of re- 
sponses to any single question into a dichot- 
omy for use with a biserial correlation coeffi- 
cient. 

3. A measure of intellectual ability was ob- 
tained. The best measure available was the 
academic grade-point averages of the subjects. 

4. The Strong Blank was readministered 
with directions to fake high scores on desig- 
nated scales. On the first trial the group was 
instructed to answer as they thought: (1) a 
carpenter would; and (2) as a physician 
would. Although the results generally coin- 
cided with results of the second and third 


trials, it was evident that the carpenter scale 
had been a poor choice, apparently being too 
easily faked. The original mean score was 
—45, mean faked score was 115 with insuffi- 
cient spread of scores to permit a test of dif- 
ferential faking ability. The low reliability of 
.39 (first versus last half corrected by Spear- 
man-Brown formula) confirmed the doubts re- 
garding the carpenter scale. 

5. In the repetitions of the experiment, four 
scales were used instead of two, obtaining fak- 
ing on one-half of each of the four scales in 
order to remain within reasonable time limits 
for testing. There are a sufficient number of 
weighted items on each half of the Interest 
Blank to provide an adequate measure of fak- 
ing. 

The four scales chosen for the second ad- 
ministration were physician, minister, lawyer 
and president of manufacturing concern. 
They were chosen because they had the lowest 
intercorrelations with physician scale, ade- 
quate reliability and, more important, because 
they represented the interest factors shown to 
be present in the Strong Vocational Interest 
Blank for Men in several factor analysis 
studies (7, p. 314 f). 

6. The reliability of the faking on the sec- 
ond and third trials was determined by pre- 
paring scoring keys for use with the IBM 
test scoring machine which provided a reli- 
ability coefficient based on odd versus even 
responses. All items with plus weights were 
given plus one weights, and all with minus 
weights were given minus one weights. 

7. In establishing a score for faking ability, 
it was apparent that a high score on the faked 
tests did not certify to high degree of faking 
ability, rather the ability to increase one’s 
original score was the measure of faking 
ability. The problem was to obtain a rela- 
tively uncontaminated measure of gain. The 
high negative correlations obtained by Stein- 
metz (5) when gain made was compared to 
original score nullifies gain made as a measure 
of faking ability, for its magnitude is a func- 
tion of initial standing. Two methods were 
tried with approximately equal results. The 
first, a ratio of gain made to gain possible was 
rejected because of the tendency of ratio 
scores to produce spurious correlations under 
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certain conditions. The faking score adopted 
was the difference between score obtained 
under faking directions and the score pre- 
dicted on the basis of the correlation between 
original and faked scores. This difference 
represents a measure of faking ability, inde- 
pendent of original score, which may be corre- 
lated with similar differences obtained on the 
other scales. 

Scatter diagrams were prepared to check the 
distribution of regressed faking scores * about 
the regression line for predicted scores for each 
of the four occupations. This was done as an 
empirical check of the assumption that the 
difference scores used in the preceding inter- 
correlations were randomly distributed about 
the regression line, and independent of the size 
of initial score. The distributions observed 
supported such an assumption. 

8. In order to establish any generality of 
faking ability, it was necessary to account for 
any variance in the correlation between faking 
scores that is associated with intelligence, edu- 
cation or experience of vocational or avoca- 
tional nature. Faking ability, if it exists as a 
psychological characteristic of any generality, 
should have some independence from the 
aforementioned factors. Biserial correlation 
coefficients were computed using the regressed 
scores on the physician-faking and president- 
faking because of their higher reliability and 
the ease with which the groups could be dicho- 
tomized as non-informed or informed about 
' the occupation judging from such data ob- 
tained on biographical information blank. 


2 The term “faking score” is used to designate the 
score based upon the difference between the obtained 
faking score and the predicted (regressed) faking 
score. 


Given such independence, the test for the 
presence of faking ability depended on low 
intercorrelations between initial scores, be- 
tween initial and faked scores, but high cor- 
relations between faked scores. This would 
show that the rank order obtained on faking 
differed from that on initial scales, either be- 
tween scales or within scales, thus indicating 
generality of faking ability. 

9. A final step in the treatment of the data 
was an item analysis of responses made on the 
faking of the physician scale, using upper and 
lower 27% of Group 1 and Groups 2 and 3 
combined. 


Results 


The data obtained in this study confirm the 
reports of previous investigators regarding the 
extent of faking that is possible on pencil- 
and-paper tests of personality and interest. 
Groups of individuals, given instructions to 
fake high scores on the Strong Vocational In- 
terest Blank for Men, are able to obtain sig- 
nificant increases in the group mean, although 
there are some individuals at all score levels 
who do not gain. The faking apparently is 
not correlated with intelligence, sex, or in- 
formation about an occupation. The biserial 
correlation coefficients between faking score 
on physician scale and grade point average 
were —.18 and .22 (Groups 1 and 2); between 
faking score and sex were —.06 and —.02, and 
for faking score and information were .02 and 
.00 (for president, manufacturing, and phy- 
sician scales with data for Groups 2 and 3 
combined). 

Table 1 shows that consistent gains were 
made in the means on all scales, with the 


Table 1 


Means and Standard Deviations of Original and Faked Raw Scores: 








Group 2 


Group 2 (N = 75) and Group 3 (N = 91) 


Group 3 





Mean SD 


Orig. Faked Orig. 


Faked 


Mean sD 


Orig. Faked Orig. Faked 





President 18 
Lawyer ‘ 20 
Physician 35 
Minister 25 


19 a ° 2s 
10 20 10 12 


36 d 40 31 


21 28 21 





36 Ralph Garry 


standard deviations remaining fairly constant, 
except for the lawyer scale. The smallest gain 
in the means, that made on the president scale 
for Group 1, is significant at greater than the 
001 level of confidence. On the whole, the 
similarity of the means and standard devia- 
tions indicates the comparability of the 
groups. This does not hold for the lawyer 
scale. The decrease of 10 raw score points 
on the standard deviation is significant above 
the .001 level of confidence. The most rea- 
sonable explanation for the decrease, and 
similarly that found with the carpenter scale 
on the first trial, is the low reliability of the 
faking scores. 

It is possible that some scales are more 
easily faked by all members of a group, result- 
ing in decreased variability under faking con- 
ditions. However, it should be noted that the 
observed decrease in variability is not asso- 
ciated with the scale’s having a higher propor- 
tion of easily faked items, assuming the num- 
ber of such items to be proportional to the 
number of large scoring weights. (It was ob- 
served in item analysis of faking that all mem- 
bers of the group choose the correct response 
for interests that are obviously related to a 
given occupation.) Under such circumstances 
the number of items upon which faking differ- 
ences could obtain would be proportionately 
smaller, resulting in decreases in standard de- 
viation. Both minister and physician scale 
have a greater proportion of large scoring 
weights than lawyer scale. 

Reliability coefficients for each set of raw 
faking scores are presented in Table 2, along 
with estimates of the reliabilities of the re- 
gressed scores, based on the formula for reli- 
ability of a difference score. With the excep- 


Table 2 


Reliability of Raw Faking Score and Regressed Faking 
Score (estimated) for the Four Scales 
for Groups 2 and 3 


Regressed Score 
(estimated) 


Raw Faking 
Score 
President 87 73 
Lawyer 55 .67 
Physician 89 79 
Minister 78 .80 








Table 3 


Intercorrelation of Faking Scales for Groups 2 and 3 








Group Group Mean 
Scales Correlated 2 3 r 


President, mfg. concern, 

and lawyer ‘ 27 .28 
President, mfg. concern, 

and physician , 35 34 
President, mfg. concern, 

and minister ‘ 05 
Lawyer and physician é .06 
Lawyer and minister ‘ ‘ 13 
Physician and minister d 12 








tion of the lawyer scale, the reliability of the 
faking scores is comparable to that reported 
by Strong (7) for scales administered under 
standard conditions. 

Table 3 presents the correlations between 
the faking scores on the various scales. These 
indicate that there is no marked general faking 
ability; evidence, all r’s are under .36. The 
fact that nearly all r’s are positive, however, 
indicates the possibility of weak general faking 
ability, which could be proved or disproved in 
a subsequent trial by using scoring keys based 
on item analysis of responses of upper and 
lower faking groups. If true, one would ex- 
pect increased correlations, using such keys. 

The correlation between two faking scores 
is independent of the initial correlation be- 
tween scales (as reported by Strong) judging 
from second order partial correlation coeffi- 
cients computed between minister and presi- 
dent scales, which changed negligibly from 
the given .00 r between original and faking 
scores. 

Items analysis of faking responses to physi- 
cian scale using top and bottom 27% of 
groups indicates that the differences obtained 
result from less than half of the weighted 
items. Both groups choose the obvious re- 
sponses of physicians. Successful faking is 
dependent on predicting the more subtle dif- 
ferences in interests. Significant differences 
are obtained on as many unweighted as 
weighted items, suggesting considerable inac- 
curacy in faking. However, the differences 
obtained do not result from a willingness of 
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the successful faking group to commit them- 
selves to a like or dislike response while the 
non-fakers remained neutral. 


Summary 


Two groups of 75 and 91 college students, 
instructed to fake high scores on four scales 
of the Strong Interest Blank after taking it 
under standard directions, demonstrated: 

1. Significant increases in mean scores on 
all scales. 

2. Split-half reliability of faking ranging 
from .56 to .89, with three scales exceeding 
.75, indicating a high degree of consistency. 

3. Intercorrelations between faking scores 
(the difference between obtained and regressed 
fake score) ranging from —.05 to .35, suggest- 
ing a low degree of general faking ability in- 
volved, with most faking being specific to the 
given scale. 

4. Faking ability was not correlated (bi- 
serial r) with intelligence, sex, or information 
regarding the occupation. 

5. The more successful in faking (in an 


item analysis) predict substantially more of 


the subtle occupational interests, whereas all 
predict the obvious. 
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The Reliability of Self-Ratings as a Function of the Amount 
of Verbal Anchoring and of the Number of Categories 
on the Scale 


A. W. Bendig 
University of Pittsburgh 


One of the first problems faced by the con- 
structor of a rating scale is the paucity of ex- 
perimental literature on the optimal character- 
istics of such scales. Information is needed, 
for example, as to the effect of variations in 
the number of scale categories and in the 
amount of verbal definition or anchoring of 
the scale categories upon both the reliability 
and validity of the scales. The scale should 
not be so coarse as to lose some of the dis- 
criminative ability of the rater, nor so fine that 
error variance is added to the ratings because 
the scale categories call for finer judgments 
than the rater is capable of making. As to 
anchoring, presumably the more defining of 
scale categories and the more objective are 
such definitions the greater will be inter-rater 
measures of reliability. However, in self-rat- 
ings, such as are commonly used in personality 
studies (3), objective and extensive verbal 
definition of scale categories may result in an 
undesirable loss in the “projective” elements 
present in such self-ratings. 

Two reports have discussed the effect of 
variations in number of scale categories upon 
reliability. Symonds (9) based a rational 
analysis of the problem upon Kelley’s correc- 
tion of an obtained correlation coefficient for 
coarseness of grouping in the measured vari- 
ables (7, p. 168). He concludes that the 
reliability of the scale should increase as the 
number of scale categories increases, but that 
this increase in reliability is minor above nine 
categories. In view of the increased difficulty 
of the task for the rater, Symonds concludes 
that the optimal number of categories is from 
seven to nine. The empirical results of 
Champney and Marshall (4) question Sy- 
monds’ analysis. In their study social work- 
ers rated visited families as to sociability on 
two forms of a graphic rating scale. These 
ratings were quantified by measuring the 


graphic ratings using a millimeter scale and 
also using a coarser centimeter scale. The 
correlation between two forms of the scale (80 
families rated twice) was significantly higher 
for the millimeter scale when compared with 
the centimeter scale (0.77 compared with 
0.67). Such a magnitude of increase is much 
greater than could be predicted for Symonds’ 
analysis. Bendig and Hughes (2) found that 
an information analysis of rating scales differ- 
ing in number of categories indicates that the 
absolute amount of information transmitted 
by the scale increased with increasing numbers 
of categories, but that the increments became 
smaller with longer scales. 

The purpose of the study reported below 
was to investigate the effect of variations in 
the number of scale categories and amount of 
verbal anchoring upon inter-judge (1) reli- 
ability estimates of self-ratings of individuals 
and of groups. 


Procedure 


Scales. Fifteen different forms of a numeri- 
cal rating scale were constructed from the 
combinations of five different numbers of scale 
categories (3, 5, 7, 9, or 11) and three condi- 
tions of verbal anchoring of the categories 
(center category defined, both end categories 
defined, or center and end categories defined). 
The lowest category on each scale was given a 
numerical value of 1, the highest category 
was rated as 3, 5, 7,9, or 11, with intermediate 
scale categories numbered accordingly. 

Subjects. The Ss were 225 undergraduate 
students in introductory and social psychology 
classes. The fifteen scales were randomly dis- 
tributed among the subjects with 15 raters 
using each of the scales. 

Instructions. Each scale was mimeo- 
graphed on a single page containing the stim- 
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Table 1 


Analysis of Variance of Group and Individual Reliability Coefficients of Rating Scales Differing in the 
Number of Scale Categories and Amount of Verbal Anchoring of the Scale 








Group Reliability 


Individual Reliability 





Sum of 


Source of Variation df Squares 





Mean 
Square F 


Sum of 
Squares 


Mean 


Square 





Total 


44 4049.24 
Number of categories 4 
2 
8 


91.91 
274.17 
793.91 

2889.25 


Amount of Anchoring 
Interaction 
Within groups 30 


5105.64 
150.53 
455.24 227.62 

1496.97 187.12 

3002.90 100.10 


22.98 
137.08 
99.24 
96.31 


37.63 — 
2.27 
1.87 





uli to be rated and instructions to the rater. 
The stimuli were the names of twelve foreign 
nations, ranging from well-known countries 
such as France and Canada to lesser known 
nations like Sweden and Egypt. The Ss were 
asked to rate themselves on how much they 
knew about the political, economic, geo- 
graphic, and sociological characteristics of 
each country. Emphasis in the instructions 
was placed upon the Ss rating their own in- 
formation about each country and the three 
verbal statements used to anchor scale cate- 
gories were: 


I know a great deal about this country. 
I know something about this country. 
I know very little about this country. 


Results 


Each group of 15 raters using one of the 15 
different scales was randomly subdivided into 
three groups containing 5 raters each. An 
estimate of the reliability of group ratings for 
each subgroup was computed using the tech- 
nique developed by Hoyt (6) and a similar 
estimate of the reliability of individual ratings 
was found using the intraclass method de- 
scribed by Snedecor (8, pp. 243-246) and 
elaborated upon by Ebel (5). The Hoyt 
procedure is designed to answer the question 
of the reliability of the mean ratings of five 
raters on the above described judgmental task 
and the Snedecor method estimates the reli- 
ability of a single rater on the same task. 
The resulting 45 group reliability coefficients 
were analyzed within the framework of a fac- 
torial design for the effect of variations in 
number of scale categories, amount of scale 


anchoring, and the interaction of these two 
variables. A similar analysis of variance was 
computed on the 45 individual reliability co- 
efficients. The results of these two analyses 
can be found in Table 1. It can be seen that 
neither of the two main variables contributed 
significantly to the total variability of either 
the group or the individual reliability coeffi- 
cients. Also, in neither case was the interac- 
tion term significant when tested against the 
within-groups (error) mean square. 

The three subgroups using each of the fif- 
teen scales were pooled and new Hoyt and 
Snedecor reliability estimates computed. In 
this analysis each of the estimates is based 
upon the ratings of 15 subjects. Since the in- 
teraction of the main variables was insignifi- 
cant, the three anchor groups were further 
pooled and group and individual reliability co- 
efficients computed for each of number-of- 
scale-categories groups. These estimates are 


Table 2 


Average Group and Individual Reliability Coefficients 
(Decimal Points Omitted) for Each Number 
of Categories on Rating Scales 








Number of Scale 
Categories 


Number 
Type of of 
Reliability Raters 3 5 9 





5 68 69 
Group 15 89 89 
45 96 : 96 


5 28. CS 33 
Individual 15 34. «(CS 35 
45 oe: 3336 
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Table 3 


Average Group and Individual Reliability Coefficients 
(Decimal Points Omitted) for Each Amount 
of Verbal Anchoring of Rating Scales 








Amount of Anchoring 
Number 
Type of of 
Reliability Raters 





Center 
End and End 


Group 5 Gi 6° 6 71 
15 86 87 89 


Center 





Individual 29 28 35 
29 31 36 





based upon 45 raters in each group. The 
average group and individual reliabilities for 
each of the category groups can be found in 
Table 2. In general, both the group and in- 
dividual reliabilities were constant when 3, 5, 
7, or 9 scale categories were used. However, 
in all instances the reliability declined some- 
what when 11 scale categories were used and 
this decrease in reliability becomes more evi- 
dent as the number of raters increases. 
Similar average group and individual reli- 
abilities for the three anchor groups are given 


in Table 3. Increased reliability can be noted 
with increased amounts of anchoring with the 
greatest increase occurring between the group 
with both ends anchored and the group with 
center and end anchoring. 


Discussion 


The results reported above suggest that the 
reliability of group or individual self-ratings 
is little affected by variations in the number 
of scale categories within the limits of from 
3 to 9, but both individual and group reliabil- 
ity begins to decline when 11 categories are 
used. This conclusion is in opposition to what 
would be predicted by Symonds’ analysis (9), 
but suggests that the point made by Champ- 
ney and Marshall (4), i.e., that rating tasks 
beyond the discriminative ability of the rater 
adds error variance to the ratings, is confirmed 
in this instance. Presumably self-ratings us- 
ing an eleven-category scale presents the rater 
with an introspective problem that is slightly 
too difficult and the reliability of his responses 
begins to decrease. While Champney and 


Marshall found increased reliability with in- 
creased refinement of the scale, we have found 
opposite results. An explanation of this dif- 
ference probably lies in the type of rating task 
presented to the subject. Champney and 
Marshall had their subjects rate the observed 
behavior of others: we had our subjects rate 
their own introspections. Obviously we can- 
not generalize our results to ratings of objec- 
tive behavior, but must limit ourselves to the 
behavior herein investigated. 

As to anchoring of the scale, increased ver- 
bal definition of the categories resulted in 
slightly increased reliability. The important 
anchor seemed to be that defining the center 
category. There was only a slight difference 
between the groups that had only the center 
category defined when compared with the 
group having only the two end categories 
anchored, but thé addition of a center anchor 
to the latter scale appreciably raised its reli- 
ability. The lack of interaction between num- 
ber of categories and amount of anchoring 
may be attributable to the fact that the cate- 
gories added to the three-category scale were 
unanchored categories inserted between the 
center and end categories. Possibly longer 
scales, each of whose categories was verbally 
anchored, might not have shown a drop in reli- 
ability between 9 and 11 scale points. 

Synthesizing these results with those pre- 
viously reported (2) we recommend that in 
constructing self-rating scales 9 categories 
should be used, since: (a) they are as reliable 
as shorter scales; and (b) they provide more 
information. However, adding additional 
categories provides some increase in informa- 
tion at the sacrifice of scale reliability. It is 
further concluded that more verbal anchoring 
of the scale will increase both the reliability 
and the information transmitted by the scale. 


Summary 


A total of 225 college students rated them- 
selves as to how much they knew about twelve 
foreign countries. The rating scales differed 
in number of scale categories (3, 5, 7,9, or 11) 
and in amount of verbal anchoring of the scale 
points (center category defined, end categories 
defined, or both center and end defined). The 
reliabilities of individual and of group ratings 
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for each scale were computed by intraclass 
methods. Results indicated equal reliability 
for scales having 3, 5, 7, or 9 categories, but 
a decrease in reliability for 11 categories. The 
reliability of the scales increased with added 
scale anchoring. The discussion emphasizes 
that the results can be generalized only to self- 
ratings and not to ratings of observed be- 
havior. 


Received May 9, 1952. 
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An Analysis of Engineering Entrance Examinations 


Harry W. Case 
Department of Engineering, University of California, Los Angeles 


The problem of measuring engineering 
achievement and aptitude is certainly not new. 
Indeed, this is an area cf investigation that 
has been under study for the last twenty years. 
Today if the studies related to this field are 
compiled, they add up to an impressive num- 
ber. In appraising engineering aptitude the 
exploratory investigations have ranged from 
determining the interrelationship existing be- 
tween the scores of general capacity tests and 
measures of success in an engineering curricu- 
lum to attempts to tease out the specific fac- 
tors making for success in engineering. For 
example, interrelationships as high as .62 (5) 
have been obtained between success in the first 
and second semesters and the American Coun- 
cil Psychological Examination (a general ca- 
pacity measure). 

When tests have been designed specifically 
to measure factors believed necessary to assure 
success in mastery of the engineering curricu- 
lum, the reported correlation coefficients are 
among those that are usually considered high 
for aptitude testing. In one study conducted 
in twelve schools the median correlation with 
first term grade averages and the Pre-Engi- 
neering Inventory was .60 (3). 

At the College of Engineering, University 
of California, Los Angeles campus, a study of 
some of the variables influencing student suc- 
cess or failure has been underway since 1945. 
This engineering college is in many respects 
somewhat unique in the engineering educa- 
tional field since it does not follow the usual 
sequence of undergraduate specialization in 
civil, electrical, and mechanical engineering 
but rather emphasizes the basic principles es- 
sential to all of these fields. One of the un- 
usual features is the gradual incorporation into 
the program of the premise that engineering 
should utilize not only the laws of the physical 
sciences but also those derived from the life 
sciences (1). 


Subjects 


Although the investigation of the factors 
that make for success in this engineering col- 
lege has been underway for a number of years, 
many of the students who have entered and 
proceeded either to graduation or withdrawal 
could not be used as subjects in this study. 
The elimination of numerous cases,—such as 
those in which courses were repeated to raise 
a grade, or in which previous specialized and 
related military training existed, pre-entrance 
engineering extension division study had been 
taken, and other related and influencing ex- 
traneous variables,—greatly reduced the total 
number of cases available for study. From a 
total of well over a thousand potential sub- 
jects the actual correlations were obtained for 
N’s which ranged from 144 to 444. Even 
though the reduction in the total number of 
subjects available for the measurement of the 
various interrelationships is regrettable, it is 
believed that matching the subjects in terms of 
previous training more than offsets the loss of 
mass data. 

It is probably somewhat unfortunate that in 
the majority of studies published in the last 
ten years no mention has been made as to 
whether the subjects have been matched in 
terms of previous training and preparation, 
although it is recognized that many students 
who enter as freshmen have had prior college 
or military training which may influence their 
success in the first two academic years. 


Procedure 


All incoming freshmen were given the com- 
plete Pre-Engineering Inventory prior to en- 
trance. This is a special abilities test battery 
using the “task-simulation” technique and was 
developed as a joint project of the Engineers’ 
Council for Professional Development, The 
American Society of Engineering Education, 
and the Carnegie Foundation for the Advance- 
ment of Teaching. It consists of the seven 
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Table 1 


Tetrachoric Intercorrelations* of P.E.I., Jr. Status Examinations, Certain Subject Areas, and Semester Grades 








— PEI Gen. Verb. 

tw PEI Tech. Verb. 

w PEI Sci. Mat. 
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Jr. Status—Total 

Lib. Arts H. S. 
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*r’s in boldface are significant at the 1% level. 


following named tests: General Verbal Ability, 
Technical Verbal Ability, Comprehension of 
Scientific Materials, General Mathematical 
Ability, Comprehension of Mechanical Prin- 
ciples, Spatial Visualizing Ability, and Under- 
standing of Modern Society. An examination 
of the material of each section of the test in- 
dicates that its face validity closely approxi- 
mates the name. 

If a student was moving from sophomore 
to junior status, either within the college or 
by means of a transfer from a junior college, 
he was required to take the Junior Status Ex- 
amination. This is an achievement examina- 
tion battery of the multiple choice type con- 
sisting of five separate tests, each covering a 
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SssSeis Drawing Grades 
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& 3 S S| '2 3rd Sem. Aver. 
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specific field: Chemistry, Physics, Mathemat- 
ics, English, and Drawing. The mathematics 
and drawing tests are University of California, 
College of Engineering Examinations, while 
the other three are from the Cooperative Test 
Series, Higher and College Level. 

In addition to these examinations, the stu- 
dent’s previous high school grade record was 
used for evaluation and admission. For ad- 
mission purposes the high school record was 
divided into two categories: those subjects 
which were classed as liberal arts and those 
subjects which were termed pre-engineering. 
The pre-engineering group of courses consisted 
of mathematics, physical sciences (chemistry 
and physics), mechanical drawing, and Eng- 
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lish. The courses remaining after these were 
deducted from the transcript were loosely 
classed as liberal arts. 

Tetrachoric r’s and the Standard Error ' for 
each of these r’s were calculated between the 
entrance devices described above and the first 
four semesters of work as well as the grouped 
subject areas of chemistry, mathematics, phys- 
ics, and drawing. These correlations, as well 
as the internal interrelationships, are shown 
in Table 1. 


Criteria 


The question of the reliability of specific sub- 
ject grades as well as the reliability of the 
semesters’ grades needs to be considered, be- 
cause if the reliability is low, little can be done 
to increase the validity of a test for selection 
purposes. The high intercorrelations existing 
between chemistry and physics grades .80, 
chemistry and mathematics grades .71, and 
physics and mathematics grades .86, would 
seem to indicate that some reliability exists, 
since the material learned in these three 
courses is related. Similarly, the fairly con- 
sistent intercorrelations between the first four 
semesters would appear to substantiate this 
belief. If the intercorrelations between semes- 
ters may be taken as an index of reliability, 
the reliability may be said to be as high as 
the relationship obtained between the measur- 
ing instruments and the criteria, i.e., tests and 
high school grades versus college semester and 
subject grades. 


Results 


An examination of the correlations in Table 
1 reveals some interesting trends. One of the 
first immediately noticeable is the magnitude 
of the intercorrelations existing between the 
sections of the P.E.I. (Pre-Engineering Inven- 
tory), when these are compared with the r’s 
existing for the various sections of the Junior 
Status Examination. One possible explana- 
tion for this difference is that it is easier to 


1 The SE’s were estimated by applying the formula: 
1.5 (1 — r’) 
SE,, = ———— 
‘N= 1 
According to Garrett (2), “An approximation to the 
SE of a tetrachoric r may be found in the following 


way: the -; is about 50% higher than the SE of an 
equivalent product-moment r. . . .” 


segregate the subject areas covered in an 
achievement type examination into measur- 
able units with little overlap. The substantial 
overlap between sections of the P.E.I. arouses 
a question as to the success with which it will 
predict the grades for a total semester and as 
to the differential value of its various sections 
for specific subject areas. 

Both the P.E.I. Total score and Composite 
score predict the four semesters’ grades fairly 
well, the one exception being a .32 correlation 
between the first semester’s work and the 
P.E.I. Composite score. The other correla- 
tions between the semesters’ grades and the 
P.E.I. Total and Composite scores range from 
.46 to .54, which is close to the median of .60 
reported by Johnson for twelve engineering 
colleges (3). On the other hand, the various 
sections of the P.E.I. show fairly low correla- 
tions with grades for specific subject areas. 
The highest single correlation existing between 
a subject area and a section of the P.E.I. was 
46 for the P.E.I. Scientific Materials and 
Physics grades. This same lack of relation- 


ship and inability to discriminate between 
subject areas was found by Moredock (4). It 
would appear, therefore, that while the test 


shows usefulness in predicting over-all success 
for the first two years of engineering curricu- 
lum it is of little use in evaluating potential 
student success for specific subjects. 

At this point it is perhaps desirable to note 
that the P.E.I. Total and Composite scores 
show little relationship with grades obtained 
in high school “pre-engineering” subjects, or 
with grades obtained in those high school sub- 
jects which have been designated as “liberal 
arts.” This low intercorrelation, combined 
with the fact that the “pre-engineering” high 
school subjects relate at a median of .42 with 
the first four semesters of college work, has al- 
lowed the P.E.I. Total score and the “pre- 
engineering” high school grade scores to be 
combined into a successful selection device. 
It would appear desirable eventually to design 
a new examination which would show less 
overlap between its sections and greater differ- 
ential value for subject areas. 

In an analysis of the results of the Junior 
Status Examination, which is of the achieve- 
ment type, it will be seen that the intercor- 
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relations for its sections range from .11 to .51. 
These low intercorrelations may in turn be 
responsible for the higher r’s of the examina- 
tion with the specific subject areas. The dif- 
ferential value is highest for chemistry, mathe- 
matics, and drawing. The physics section of 
the examination, which has its highest correla- 
tion with chemistry grades, was a shortened 
version restricted to Mechanics and Electric- 
ity. The intercorrelations between both “pre- 
engineering” and “liberal arts’ high school 
subject grades with the total score of the 
Junior Status Examination are also greater 
than those obtained with the P.E.I. The cor- 
relations of the examination and the first four 
semesters of work range from .31 to .74. 

Two additional correlations which are not 
included in Table 1 have been obtained for the 
total of the Junior Status Examination and 
success in the fifth and sixth semesters of 
work. The correlation between the fifth se- 
mester of work and the examination is .65, 
and the sixth semester is .58. These two r’s 
were obtained by the Pearson product moment 
method and are based upon 100 and 68 cases 
respectively. 

It should perhaps be noted that while this 


paper has been devoted to an analysis of the 
interrelationships existing between the results 
of the examinations and high school grades 
when used for entrance evaluation and the 
grades received in the first t» . years of engi- 


neering college, the examina. .as have proved 
useful in many other ways that are difficult to 
quantify. For example, information obtained 
from one of the examinations has been used in 
conjunction with a diagnostic interview to de- 
termine the areas in which remedial work is 
needed. 


Conclusions 


1. The Pre-Engineering Inventory shows a 
consistent correlation with the grades from the 
first four semesters of work, which makes it 
useful as a selection device. . 

2. The sections of the Pre-Engineering In- 
ventory show no clearly defined relationship 
with specific subject areas, which would make 
it useful for differential selection within engi- 
neering. 

3. The Pre-Engineering Inventory shows a 
low interrelationship with ‘“pre-engineering” 
subject high school grades. 

4. High school “pre-engineering’’ subject 
grades show a consistent correlation with 
grades from the first four semesters of engi- 
neering college work. 

5. An achievement examination such as the 
Junior Status Examination shows both greater 
differential value and greater over-all relation- 
ship with semester grades. 


Received March 10, 1952. 
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Differential Sex Responses to Items of the MMPI . 


L. E. Drake 


Student Counseling Center, University of Wisconsin 


In a rather extensive study of the MMPI 
some incidental evidence has come forth which 
appears important enough to warrant an early 
report. The frequency of Yes, No, and ? re- 
sponses to each of the 550 items of the card 
form was obtained separately for 2,270 under- 
graduate male students and for 1,148 unmar- 
ried, undergraduate female students.’ All 
were enrolled in the University at the time of 
testing and none obtained an L score over 70 
or an F score over 80. 

Excluding items to which 90 per cent or more 
of both groups responded in the same direc- 
tion and excluding those to which 10 per cent 
or less responded in the same direction, 306 
items yielded critical ratios between the sexes 
ranging from 2.0 to 32.3. A total of 43 items 
was selected from these 306. These were 
items that 50 per cent or more of the females 


1 Tables of frequency counts for each item by sex 
have been deposited with the American Documenta- 
tion Institute. Order Document 3860 from American 
Documentation Institute, c/o Library of Congress, 
Washington 25, D. C., remitting $1.25 for photocopies 
(6 X 8 inches) readable without optical aid or $1.25 
for microfilm (images one inch high on standard 35 
mm, motion picture film). 


responded to in a direction in which less than 
50 per cent of the males responded. Answer 
sheets for 100 males and 99 females not in- 
cluded in the original groups were scored for 
the 306 items and the 43 items. The resulting 
coefficient of correlation was +.80. 

Answer sheets for 3,229 males and 1,612 fe- 
males were then scored with the 43-item key. 
Only 2% of the females obtained a score as 
small as or smaller than the mean score for 
the males and only 2% of the males obtained 
a score as large as or larger than the mean 
score for the females. That this sex difference 
is reliable is further indicated by the fact that 
a coefficient of correlation of +.80 was ob- 
tained between scores (43-item scale) ob- 
tained by 474 males and 224 females on the 
group form taken at time of entrance to the 
University and the card form taken up to one 
year later. 

It is quite apparent that sex is an important 
factor in establishing criterion groups, espe- 
cially for scale construction for this type of 
inventory. 


Received October 23, 1952. 
Early publication. 
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A Study of Medical Students with the MMPI: III. 
Personality and Academic Success 


William Schofield 


University of Minnesota 


Two previous papers in this series have re- 
ported general normative data for samples of 
medical students studied with the Minnesota 
Multiphasic Personality Inventory (1) and 
the nature of changes in the MMPI profiles of 
students from the freshman to junior years of 
the medical curriculum of the University of 
Minnesota (2): This paper, the last in the 
series, is concerned with the relationships be- 
tween MMPI profiles and academic success. 


Class Standing and MMPI Profile 


The Dean of the Medical Sciences provided 
data on the total honor point ratios at the 
completion of the junior year of the members 
of the class used in this study. The Student 
Counseling Bureau provided the American 
Council on Education Psychological Examina- 
tion (ACE) scores of the students. With 
these data at hand, it was possible to select 
students from the upper and lower quarters of 
the class who were matched for scholastic 
aptitude as measured by the ACE. These 
matched groups were then studied for similar- 
ities and differences on the MMPI. 

The forcing of homogeneity on the aptitude 
variable resulted in very small samples (11 
students each) from the upper and lower quar- 
ters, but this was considered preferable to the 
use of larger N’s with uncontrolled aptitude 
variance. The differences between the ACE 
scores of matched upper and lower quarter 
students ranged from zero to nine percentile 
points, with an average difference of five per- 
centile points. 

Figure 1 shows the mean freshman year 
profiles of the upper and lower quarter sam- 
ples. Three of the clinical scales reveal a 
statistically reliable difference in the mean 
scores of the two groups. The lower quarter 
group had reliably higher mean scores on the 
Hy, Pd, and Sc scales. Table 1 presents the 
data for these comparisons. The differences 


47 


between the two groups are seen to be limited 
to a very few scales and, while statistically 
reliable, are not great. In general, as seen in 
Figure 1, the profiles of the upper and lower 
quarter students are similar, particularly in 
terms of the relative elevations of the “char- 
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Fic. 1. Mean freshman year MMPI profiles of 
samples of medical students from the top and bottom 
quarters of their class at the end of the junior year. 
(N= 11). 


acter structure” scales of the right side of the 
profile. To the degree that. the good and 
poor achievers are distinguished by the 
MMPI, the distinction appears in the relative 
degree of hysteroid, psychopathic, and schiz- 
oid tendencies; the poorer achievers are char- 
acterized by a tendency to unrealistic ap- 
praisals of their environment, unhappy social 
relationships, and autistic rumination. Also, 
the poor achievers show a general tendency 
toward a relatively unsophisticated denial of 
personal weaknesses and the expression of an 
idealized self-concept (L). In this regard, the 
students who work up to capacity tend to 
manifest a more realistic self appraisal. 

As an approach to testing the predictive 
significance of the group differentiations 
turned up in this comparison of mean profiles, 
the total class of students was sorted into two 
groups: (1) a group each member of which 
had an MMPI profile characterized by one 
or both of the two highest scores falling on the 
Hy, Pd, or Sc scales; and (2) a group whose 
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Table 1 


Means and Standard Deviations of the Freshman MMPI Scores for Samples of Upper Quarter and 





Lower Quarter Medical Students Matched for Scholastic Aptitude! (N = 11) 











Upper Quarter 


Mean Sigma 


Lower Quarter 





1.2 
F? tw 
K 57.1 
Hs 47.4 
D 47.4 
Hy 49.3 
Pd 53.0 
Mf 64.0 
Pa 49.7 
Pt 50.9 
Se 51.0 
Ma 59.7 


2.06* 
1.28 





1 Academic standing determined from honor point ratio at end of junior year. Upper and lower quarter 


students matched for ACE. 


* Statistics based on raw score data; for raw scores on L which are Jess than 3, arbitrarily T scores of 50 are set. 


* Significant at 5% level. 
** Significant at i% level. 


profiles did not have the above characteristics. 
From these two groups, two smaller samples 
were drawn so that each member from the 
group with Hy, Pd, or Sc high points was 
matched with a member from the other group 
for ACE score. Then a study was made of the 
honor point ratios of these two samples which 
were equated for scholastic aptitude but dif- 
ferentiated by the presence and absence of 
certain scales as high points in their MMPI 
profiles. Table 2 reports the mean honor 
point ratios (HPR) of these two samples. 
The group characterized by profiles with high 
points on the Hy, Pd, or Sc scales yielded 
a mean HPR clearly inferior to that of the 
group not so characterized. However, since 
the variances of the two groups differed signifi- 
cantly, it was not possible to test for the reli- 
ability of the difference between the means. 
It may be concluded, nevertheless, that these 
samples do not support the hypothesis that the 
populations of which they are representative 
have identical distributions of honor point 
ratios. 

Figure 2 shows the actual distribution of 
HPR’s for the two samples. The greater 


range of achievement in the group not char- 
acterized by high points on Hy, Pd, or Sc is 
clear from this figure. While there is clear 
overlap of the two distributions, a cutting line 
at HPR = 1.6 shows only 23% of the group 
with the specified high points to have HPR’s 
larger than this value, while 62% of the other 


Table 2 


Means and Standard Deviations of the Honor Point 
Ratios of Two Groups of Medical Students 
Differentiated by MMPI High Points 
and Matched for ACE Scores 








Honor Point 


Ratios ACE G%ile 





Mean 


64.7 
63.9 


Mean Sigma 


1.48 .22 
1.80 36 


F = 2.68* 


Sigma 





22.8 
21.9 





* Group A had profiles for which one or both of the 
two highest scores fell on the Hy, Pd, or Sc scales. 
Group B did not have either of their two highest scores 
on the Hy, Pd, or Sc scales. 

> Total honor point ratio at end of junior year. 

* Significant at 5% level. 
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Fic. 2. 


Distribution of honor point ratios of two groups of medical students differentiated by 


MMPT high points and matched for ACE scores.* 


* Group A had profiles for which one or both of the two highest scores fell on the Hy, Pd, or Sc scales. 
B did not have either of their two highest scores on the Hy, Pd, or Sc scales. 


group fall above HPR = 1.6. At the other 
end of the distribution, while 43% of the Hy- 
Pd-Sc high point group have HPR’s < 1.5, 
only 14% of the other group fall below this 
score. It appears that the presence of the 
highest or next to highest score of a medical 
student’s profile on the Hy, Pd, or Sc scales 
is highly predictive of underachievement. 
Another approach to this study of relation- 
ships between personality and medical school 
achievement was made by studying the rela- 
tive success of students with deviant MMPI 
profiles and those with profiles within the nor- 
mal range. There were 18 members of the 
class (21.6%) who had freshman MMPI pro- 
files with at least one of the clinical scales 
showing a T-score of 70 or greater. Fifteen 
of these constituted the “deviant” sample. 
Fifteen students with profiles entirely within 
the normal range were selected so as to be 
matched with the “deviant” group for ACE 
scores. The range of the differences between 
the ACE scores of the matched students ran 
from zero to six percentile points. Table 3 
reports the mean honor point ratios for the 
“deviant” and “non-deviant” groups at the 
end of the junior year. Figure 3 shows the 
mean profiles of the “deviant” and “non- 
deviant” group. The mean profile of the 
“deviant” group reflects the fact that ten of 


Group 


the fifteen students in this group had a score 
of 70 or greater on the Mf scale. It is obvious 
that the two groups are essentially identical 
in their academic performance as expressed by 
the honor point ratio. 

It was considered of interest to make one 
additional study of MMPI profiles and 
achievement. This was a study of the rela- 
tionship between medical school class rank at 
the end of the junior year and the amount of 
difference between the freshman and junior 
year MMPI profiles. For this purpose the 
same matched samples of upper and lower 
quarter students for whom mean honor point 
ratios are reported above were used. Figures 
4 and 5 indicate the freshman and junior pro- 
files of these two samples. It is quite clear 
that the upper quarter students show a much 


Table 3 


Means and Standard Deviations of Honor Point Ratios 
(Junior) of Medical Students with “Deviant” and 
“Non-Deviant” Freshman MMPI Profiles 


ACE 
Percentile 


Honor Point 
Ratio 


Mean S.D. 


34 S48 
72.7 5.23 


Mean S.D. 
1.58 49 
[es 


Group N 
Deviant 15 
Non-Deviant 15 
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Fic. 3. Mean MMPI profile of a sample of fifteen 
medical students having at least one clinical score 
over T = 70, and mean profile of fifteen students with 
no deviant score, both samples matched for ACE 
scores. 


greater change in their mean profile over the 
two year interval than do the lower quarter 
students. The top quarter sample showed a 
reliable increase in mean score on the Sc scale, 
and statistically significant decreases in means 
on the Mf and Ma scales. Thus, the top 
quarter students showed a tendency after two 
years in the medical curriculum toward a “de- 
femininization” of their interest and activity 
pattern (Mf) although remaining clearly de- 
viant from the general population males. 
Likewise, their morale, optimism, enthusiasm 
and self confidence showed a drop toward the 
general population norm (Ma) which may re- 
flect a more realistic appreciation of their ca- 
pacities and the demands of medical training. 
The increase in Sc suggests a tendency to 
greater self analysis and general philosophical 
probing which is probably in line with the 
drop in manic features. 

By contrast, the bottom quarter sample re- 
vealed little tendency to reliable change over 





















































Fic. 4. Mean freshman and junior year MMPI 
profiles of a sample of medical students in the top 
quarter of their class at the end of the junior year. 
(N= 11). 


the two year interval, the sole change having 
statistical reliability being a drop in the Ma 
score suggestive of mild deterioration of 
morale. Tables 4 and 5 present the means 
and standard deviations of the scale scores for 
both years and both samples together with 
measures of the reliability of the freshman- 
junior differences. 

As a further check on the relationship be- 
tween amount of change in MMPI profile and 
academic performance a scattergram was pre- 
pared to show the joint distribution of these 
two variables for the entire class of 83 stu- 
dents. The “change” score for each subject 
was obtained by adding, without regard to 
sign, the differences between his freshman and 
junior scores on each of the nine clinical scales. 
For the total sample, this variable showed a 
range of 29-118 T-score points, with a mean 
of 54.90 and a standard deviation of 17.16. 
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3. §. Mean freshman and junior year MMPI 
profiles of a sample of medical students in the bottom 
quarter of their class at the end of the junior year. 
(N= 11). 


The honor point ratios for the group had a 
range of 1.1-2.8, with a mean of 1.71 and a 
sigma of .35. Inspection of the scattergram 
did not suggest any marked relationship be- 
tween these two variables although it did ap- 
pear that there was a slight tendency for 
higher honor point ratios to be associated with 
higher change scores. Table 6 indicates the 
means and sigmas of the honor point ratios 
for the subjects having the 20 lowest and the 
20 highest MMPI change scores. The differ- 
ence between the mean honor point ratios of 
these two groups is not statistically reliable. 
Comparison of the distribution of honor point 
ratios for these two groups revealed consider- 
able overlap. 
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Table 4 


Means and Standard Deviations of Freshman and Junior Year MMPI Scores for a Sample of 
11) 


Freshman 


Junior 


Standard 
Deviation 


1.5 

3.1 
61.6 
50.0 
46.8 
46.2 
57.6 
59.3 
51.3 
52.3 


Standard 
Deviation 


Mean 


Mean 
1.2 1.1 

3.7 
57.1 
47.4 
47.4 
49.3 
53.0 
64.0 
49.7 
50.9 


D 
Hy 
Pd 
Mf 
Pa 
Pt 


Sc 
Ma 


51.0 
59.7 


* Significant at 5% level 
** Significant at 1% level. 


56.0 
54.8 


Table 5 


Means and Standard Deviations of Freshman 


Freshman 


Standard 
Deviation 
1.2 
2.4 
6.8 
6.4 
8.0 
6.2 
9.2 
12.2 
6.5 
8.2 
6.6 
6.6 


Mean 
2.7 
3.2 

61.0 
50.9 
50.5 
57.4 
57.1 
62.0 
50.0 
54.6 
55.4 
63.5 





Ma 


* Significant at 5% level. 
** Significant at 1% level. 


Summary and Conclusions 


Using total honor point ratio at the end of 
the junior year of medical school as a crite- 
rion, an attempt was made to investigate the 
relationship between personality tendencies, as 
revealed in a freshman year MMPI profile, 
and academic performance. Also a study was 


and Junior Year MMPI Scores for a Sample of 


Bottom Quarter Medical Students (N = 11) 


Junior 


Standard 


Mean Deviation 


2.4 
1.4 
6.2 
5.1 
7.4 
5.6 
7.0 
9.6 
5.9 
5.1 
6.8 
7.1 


1.8 

2.4 
53.0 
51.7 
50.2 
56.5 
58.6 
59.2 
50.2 


made of the relationship between amount of 
personality change between the freshman and 
junior years and scholastic achievement. 
These analyses were based on data for 83 male 
students who entered the University of Min- 
nesota Medical School in 1946. 

1. When the average profile of upper quar- 
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Table 6 


Means and Standard Deviations of Honor Point Ratios 
of the 20 Students Having the Lowest Amount of 
MMPI Score Change, Freshman to Junior 
Year, and of the 20 Students with 
Greatest Change 








Honor Point Ratio 


Mean Sigma 
1.7 .28 F = 1.746 
1.8 37 t= .84 


Group 
Low Change 
High Change 





ter students was compared with that of lower 
quarter students, with the subjects of the two 
samples matched for ACE scores, certain of 
the scales revealed reliable mean differences 
between the two groups. The scales yielding 
reliable differences between the top and bot- 
tom quarter samples were Hy, Pd, and Sc. In 
general, the low quarter students revealed a 
tendency toward greater neuroticism and de- 
fection in interpersonal and social relation- 
ships. 

2. When students were separated into two 
groups with members of the groups matched 
for academic aptitude (ACE) but differen- 
tiated by the occurrence and non-occurrence 
of high points of their profiles on the Hy, Pd, 
or Sc scales of the MMPI, it was found that 
the group having such high points was clearly 
inferior in academic performance (HPR) to 
the group not showing these scales as high 
points. Ninety per cent of the group with 
high points on Hy, Pd, or Sc had honor point 
ratios falling below the median HPR (1.75) 
of the group without these high points. 

3. The fact of an MMPI profile with at 
least one elevated clinical score (T > 70) did 
not appear to be predictive of inferior aca- 
demic performance. When a group of stu- 
dents each of whom had at least one elevated 
score was compared with a group having no 
elevations, with members of the two groups 
matched for ACE, the honor point ratios of 
the two groups were found to be essentially 
identical. 


4. It was found that the samples of first 
and fourth quarter students, equated for ACE 
scores, were very different with respect to the 
amount of change in their respective MMPI 
profiles from the freshman to the junior years. 
The top quarter students revealed a reliable 
change in mean score on three of the nine 
clinical scales (Mf, Sc, and Ma). The bottom 
quarter sample showed reliable freshman-to- 
junior changes only in decrease in their Ma 
score. 

5. When a comparison was made of the 
honor point ratios of students having the 
largest amount of change in their clinical 
MMPI scores from the freshman to the junior 
years and the honor point ratios of students 
with the smallest amount of change, it was 
found that the two groups had essentially 
identical honor point ratios and there was con- 
siderable overlap between the two groups. 

6. In general, it appears that when aca- 
demic aptitude is constant, the likelihood of 
achievement up to capacity in the medical cur- 
riculum becomes less as hysteroid, psycho- 
pathic, and schizoid traits, measured by the 
MMPI, are greater. It may be hypothesized 
that students who show both a restricted 
scholastic promise and marked deviation on 
the Hy, Pd, or Sc scales would be particularly 
poor academic risks. In the absence of any 
limitation of academic aptitude, the admis- 
sion to medical training of students showing 
chief deviations (even though within the 
“normal” limits) on the Hy, Pd, and Sc vari- 
ables would appear to make for a lowering of 
the general level of scholarship of the medical 
school class. 
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Kendall (2, p. 89) mentions: “. . . the 
desirability of examining the primary data to 
see if there are any obvious effects present.” 
The present note aims to enlarge upon this. 
Ranking is often a quick and meaningful 
method of obtaining various types of psycho- 
logical data. Sometimes subjects are asked to 
rank various items in order of preference, such 
as those in the Job Preferences Scale of Jur- 
gensen (1). Usually, in order to obtain some 
sort of over-all picture such rankings for each 
item are summated, and mean ranks are then 
calculated. After this the mean, or total 
scores are placed in rank order once more. 

Such a procedure may conceal important 
information. This is becoming apparent in 
data being accumulated on an Anglicized ver- 
sion of Jurgensen’s Scale, but, as the number 
of subjects in this study is small, the data will 
not be published at present. It seems that 
such data should first be arranged to show the 
number of times each item is placed in each 
rank. In other words a frequency distribu- 
tion should be made for each item. This dis- 
tribution may be turned into a graph for those 
who prefer to look at their results in this way. 

The following example should make this 
clear, although it is much more simple than is 
usually encountered, since preferences are 
asked concerning three items only. Wyatt, 
Langdon, and Stock (3, p. 12) asked 19 opera- 
tives engaged on chocolate (candy) packing to 
rank in order of preference three sizes of 
boxes, viz., 14 lb., 4 lb., and 1 lb. (The situ- 
ation is somewhat unreal in Britain today.) 
Their results are as follows: 


Ist 2nd 3rd 


Large boxes 10 ae 
Medium boxes 3 Maz 
Small boxes 6 4 9 


If these figures are summated, their mean 
ranks calculated, and the boxes arranged to 
give a final ranking for all operatives, we have 
the following results: 


fx x Rank 


1.89 1 
1.95 2 
2.16 3 


Large boxes 36 
Medium boxes 37 
Small boxes 41 


This latter table suggests that there is little 
difference in the over-all order of preference, 
whereas in fact the operatives tend to either 
like, or dislike, both the large and the small 
boxes, according to individual choice, while 
most of them place the medium boxes in the 
middle. The investigators found this to be of 
some importance, as there was a close corre- 
spondence between output in packing one type 
of box and preference for that box. This, 
however, is not the place for an argument into 
which is cause and which effect. The example 
is merely an illustration of Kendall’s plea. 
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Identification of American, British, and Lebanese Cigarettes 


E. Terry Prothro 


American University of Beirut, Lebanese Republic 


Habitual smokers generally believe that 
they can differentiate between various. brands 
of cigarettes. In many countries a smoker has 
a wide choice of both domestic and foreign 
cigarettes, and there is often a substantial dif- 
ference in price between one brand and an- 
other. Obviously smokers must believe in a 
discriminable superiority of the more expen- 
sive brands if these brands are to be smoked 
for reasons other than “conspicuous consump- 
tion.” Advertisers, of course, encourage the 
belief that different cigarettes have unique 
characteristics. 

Investigations, however, by American psy- 
chologists throw considerable doubt on the be- 
lief that cigarettes can be identified by persons 
who do not know the brand they are smoking. 
Hull (1) found in the course of investigations 
on another problem that his Ss. frequently 
failed to distinguish between tobacco smoke 
and warm moist air if visual cues were elimi- 
nated by a blindfold. Husband and Godfrey 
(2) requested blindfolded Ss to identify five 
American brands of cigarettes, and found that 
performance was only slightly better than 
chance on all brands except a mentholated 
one. 

More recently Ramond, Rachal and Marks 
(3) examined the ability of habitual smokers 
to identify three popular American brands. 
They gave each of their subjects a practice 
smoking session during which there was op- 
portunity to study the characteristics of the 
three brands. They did not blindfold their 
subjects on grounds that “a blindfold ob- 
scures the central problem.” Thus the sub- 
jects could examine the texture of cigarette 
paper, the color and size of the tobacco shreds, 
etc. During the test session gummed labels 
were placed over the brand names. Their sub- 
jects were able to identify each of the three 
brands slightly more often than chance. 
Smokers who preferred one of the three brands 
were able to identify that brand significantly 


more frequently than could smokers of other 
brands. 

If we grant that even habitual smokers in 
America have difficulty in distinguishing be- 
tween American brands of cigarettes, two 
questions present themselves. Is the difficulty 
a result of similarity of all tobacco smoke? 
Does the fact that American subjects in these 
experiments tend to smoke one brand of ciga- 
rettes to the exclusion of others affect their 
ability to identify non-preferred brands? 

The situation in Lebanon is well suited to 
a preliminary investigation of these questions. 
Both American and non-American brands are 
used extensively, and the difference in price 
of various brands causes college students to 
vary the brand purchased as their own finan- 
cial status fluctuates. Also there is some 
fluctuation in availability of brands on the 
market. 

In Lebanon, as in most of the Arab Near 
East, American, English and domestic ciga- 
rettes are available. Of these the American 
cigarettes are the most expensive. English 
brands cost about 10 per cent less. Domestic 
cigarettes—made from tobacco grown in the 
Near East—are about half as expensive as 
American brands. The sale of cigarettes is 
under control of a government-supported 
monopoly which establishes prices and deter- 
mines what cigarettes are to be imported. At 
the present time two American brands, Camel 
and Lucky Strike, and two English brands, 
Players and Gold Flake, are found on the 
market. 


Procedure 


Subjects were 50 male college students who 
stated that they smoked at least five cigarettes 
per day. They were obtained by asking for 
volunteers from the student body of the Amer- 
ican University of Beirut. 

Each S was brought into a well-ventilated 
room and shown a table on which there were 
six packages of cigarettes. There was one 
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package each of the following brands: Camel, 
Lucky Strike, Gold Flake, Players, Bafra and 
Star. The choice of American and English 
cigarettes was based on availability to con- 
sumers. Bafra and Star were selected as being 
among the most popular of the Lebanese ciga- 
rettes. An effort was made by E to use ciga- 
rettes of equal freshness. The S was told that 
he would be presented with one cigarette from 
each of the six packages in turn, and that he 
was to try to identify each cigarette im- 
mediately after smoking it. He was warned 
that each guess was final and that he could 
not change his opinion about one cigarette 
after smoking some of the others. Thus if he 
guessed Bafra for the first cigarette and then 
decided that the second cigarette was actually 
the Bafra he was permitted to name the sec- 
ond “Bafra” but not permitted to change his 
guess on the first cigarette. 

The S was next asked which cigarette he 
preferred. Then he was seated and blind- 
folded. Cigarettes were placed in wooden 
holders which were 6 cm. long. The holder 
was placed in the S’s mouth and the cigarette 
was lit for him. He was not permitted to 
touch or to see the cigarette at any time. As 
soon as he identified the cigarette it was re- 
moved and placed in a water-filled can. The 
S was then permitted to rinse his mouth at a 
water fountain just outside the experimental 
room. Approximately two minutes elapsed 
from the time one cigarette was identified until 
the next one was lit. It was hoped that the 
use of blindfolds and holders would minimize 


available cues so that successful identifications 
not attributable to chance might be attributed 
to the qualities of the smoke itself. 

The order in which the cigarettes were pre- 
sented varied from subject to subject, and was 
determined by use of a table of random num- 
bers. 


Results 


It can be seen from Table 1 that our sub- 
jects were able to identify the American and 
English brands about half of the time and to 
identify the Lebanese brands even more often. 
Bafra was the most easily identified of these 
brands. Only seven of the subjects failed to 
identify it. Of the 300 attempts at identifica- 
tion, 180 or 60 per cent were correct. These 
results are considerably better than chance, 
and the superiority to chance is highly signifi- 
cant statistically. The value of chi-square for 
Table 1 is 462. This value is much too large 
to be found in the average table of chi-square. 
Moreover, each brand was identified better 
than chance. If we consider only the cells 
which pertain to correct identification, we find 
that the value of chi-square for these cells 
varies from 32 to 122. All of these values are 
highly significant. 

From these results it appears that all ciga- 
rette smoke is not the same. Havitual smok- 
ers can differentiate between these six brands. 

The difference between our results and the 
conclusions of Husband and Godfrey, and of 
Ramond et al. might lead us to conclude that 
the cigarettes which are popular in the Near 


Table 1 


Number of Subjects Giving Each Response after Smoking Each of the Brands 


Brand Smoked Camel Strike 


Camel 24 5 
Lucky Strike 8 

Gold Flake 7 

Players 5 

Bafra 

Star 


Total 





Lucky 


Brand Named by Subject 


Gold 
Flake 


Players 


Bafra 
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East are less similar to each other than are the 
popular American brands. There is, however, 
another possible explanation. Lebanese stu- 
dents vary the brands smoked to a greater 
extent than do American students. Conse- 
quently the Lebanese students may be better 
able to differentiate cigarettes because of a 
more varied experience. 

In this connection it should be noted that 
our Ss identified the American brands quite 
successfully. There was little confusion be- 
tween Camels and Lucky Strikes. 

The results of Ramond et al. support the 
thesis that preference determines identifiabil- 
ity in America. Their Ss could identify the 
brand which they preferred more than 70 per 
cent of the time. On the other hand, those Ss 
who preferred a brand other than the ones 
used in the experiment averaged only 20 per 
cent correct identification, although chance 
performance was 33 per cent. 

Of our Ss, 44 expressed a preference for one 
of the six brands and 26 (nearly six-tenths) 
of these were able to identify the preferred 
brand. When we recall that exactly six-tenths 
of all 300 trials were correct, it is apparent 
that our students could identify non-preferred 
brands as readily as they could identify pre- 
ferred brands. 


Summary 


A total of 50 male students at the American 
University of Beirut who smoke at least five 
cigarettes per day were asked to discriminate 
between six brands of cigarettes which are 
popular in the Near East. Of the six brands, 


two were American, two British and two 
Lebanese. Ss were blindfolded and presented 
with six cigarettes in succession. They were 
required to guess at the identification of each 
brand before proceeding to the next. All ciga- 
rettes were presented in wooden holders. 

Ss were able to identify each of the six 
brands significantly more often than chance. 
Of all attempts at identification, 60 per cent 
were correct. It therefore appears that ha- 
bitual smokers can discriminate between these 
cigarettes on a basis of the smoke alone. 

It was pointed out that the superior per- 
formance of our subjects, even at distinguish- 
ing between Camels and Lucky Strikes, might 
be attributed to the tendency of Lebanese 
smokers to vary the brand smoked to a 
greater extent than do American smokers. 
The results are compatible with this thesis, for 
our subjects were able to identify non-pre- 
ferred brands as readily as preferred brands. 
In contrast, a recent study (3) of American 
smokers demonstrated that they could identify 
the brand they preferred, but could not iden- 
tify other brands. 
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An Apparatus for Measuring Operational Hand Steadiness 


J. Stanley Gray, George Sustare, and Anthony Thompson 


University of Georgia 


It is a patent belief of both job analysts 
and skilled workmen that the degree of skill 
is affected by hand steadiness. This study is 
an attempt to investigate that belief. 

Available apparatus for measuring hand 
steadiness (like the Whipple Tracing Board) 
was found to measure predominately static 
steadiness or tremor. Skilled work involves 
operational steadiness. Consequently, an ap- 
paratus was devised to measure hand steadi- 
ness in three dimensions. This stasiometer 
consists of a 24” x 30” base on which are 
mounted the ends of seven feet of 14-inch 
copper tubing bent in three dimensions, an 
electric counter, a transformer, and a knife 
switch. A brass stylus-ring, through which 
the tubing passes, is connected to one terminal 
of the transformer and the copper tubing to 
the other through the counter. Contact be- 
tween the stylus-ring and the tube activates 
the counter. See Figure 1 for schema. 








a 
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‘ 











Fic. 1. Schematic diagram of stasiometer. 


A test run consists of passing the stylus- 
ring along the bent tubing from one end to 
the other and back again as rapidly as possible 
with the least number of contacts. After a 
practice training run, the time in seconds and 
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the contacts (as recorded by the counter) are 
recorded for two runs, or four crossings. A 
minute rest is allowed between these two re- 
corded runs. 


Table 1 


Mean Time and Contacts for Two Runs 
on the Stasiometer 
Mean S.D. 
First Run 
Time (Sec.) 
Contacts 
Second Run 
Time (Sec.) 
Contacts 


31.7 
34.9 


80.9 
115.9 


78.9 
107.2 


29.6 
35.3 


The test was administered to a norm group 
of 400 undergraduate university students (222 
men and 178 women). The mean time and 
mean contacts for each of two runs are shown 
in Table 1. A table of sigma values was then 
constructed and each raw score was converted 
into a Z score (mean 100, sigma 30). A Z 
score was thus obtained for each run (time 
plus contacts). The coefficient of correlation 
between these runs was .84 +.01, which was 
interpreted to indicate a satisfactory reliabil- 
ity. 

The Z scores for each run were then added 
to obtain a total steadiness test score. The 
distribution of the 400 norm cases is shown 
in Figure 2. 

Various factors which might affect steadi- 
ness were studied but only two showed any 
statistical significance. Sex was a highly sig- 
nificant factor, the men having 10 Z-score 
points higher than the women. Smoking had 
only a slightly significant effect on steadiness, 
as indicated in Table 2. 

A total of 100 members of the norm group 
was given the Edwards’ finger tremor test.* 
The coefficient of correlation was .004, indicat- 


1 Edwards, A. S. The finger tromometer. Amer. 


J. Psychol., 1946, 59, 273-283. 





58 J. Stanley Gray, George Sustare, and Anthony Thompson 


FREQUENCY 








24- 36- 48- 6O- 72- 
35 47 S9 7I 


@4- 96- 108- 120- 132- 144- 
83 95 107 119 13) 143 155 
Fic. 2. Distribution of 400 standard scores on the 


stasiometer. 


ing that hand operational steadiness and hand 
static steadiness are not related. 

Another group of 50 subjects was given the 
Purdue Pegboard dexterity test. The correla- 
tion of these scores with those on the stasi- 
ometer was .057. 

The stasiometer test was given to 50 skilled 
workmen (tool and dye makers, machinists, 


Table 2 


The Effects of Smoking and Sex on Operational 
Hand Steadiness 








Mean 
Steadiness 
N Z Score 
Smokers 225 101.9 
Non Smokers 175 99.2 


Male 222 105.1 
Female 178 95.1 








sheet metal workers, and welding inspectors). 
The average Z score for this group was 121.5, 
S.D. 16.7, as compared with a mean of 100 and 
S.D. of 30 for the norm group. The CR of 
the difference between these averages was 7.6. 
Further validity data are being collected. 
Apparently the stasiometer is a reliable in- 
strument for measuring operational steadiness 
and it may have some usefulness in selecting 
apprentices for various skilled occupations. 
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Miller, Delbert C., and Form, William H. 
Industrial sociology; An introduction to the 
sociology of work relations. New York: 
Harper & Brothers, 1951. Pp. 896. $6.00. 


The objective of this book is to present the 
sociology of work relations. The word indus- 
trial is used as referring to all forms of eco- 
nomic activity. Industrial Sociology includes 
the study of occupations and all social groups 
that affect work behavior. Conceiving the 
subject from this point of view, the book deals 
with the interrelationships between the work 
behavior of the individual and the other as- 
pects of his social activities. “The Frame- 
work of Industrial Sociology” (p. 30), as the 
five major subdivisions of the book, include: 
“T. Industrial Sociology: Its Rise and Scope; 
II. The Social Organization of the Work 
Plant; III. Major Problems of Applied In- 
dustrial Sociology; IV. The Social Adjustment 
of the Worker; and V. Industry, Community 
and Society.” 

Involving themselves with such broad in- 


terest areas without reaching encyclopedically 
precise detail will stimulate instructors and 
critics to judge the content and choice of the 


authors’ evaluative selections. Such selections 
are exemplified by the identifying of “The rise 
of industrial sociology . . .” (p. 3), with the 
familiar Hawthorne experiments; a sample 
curriculum for the training of an industrial 
sociologist (p. 86), and a chart, included both 
on the front cover and on page 11, listing the 
chronological “Outlines of the Main Streams 
& Tributaries of Industrial Relations Knowl- 
edge Contributed by the Basic & Applied So- 
cial Sciences.” The authors have provided an 
interesting base from which to work irrespec- 
tive of the specific selections of the materials 
presented. Of particular value to this area 
which overlaps many disciplines is a glossary 
at the end of the book. 

In a volume of this size there is much mate- 
rial which any particular group of readers may 
consider extraneous. The first 306 pages gen- 
erally deal with basic informative material. 
To students who have had some particularized 
education in labor movements, industrial eco- 


nomics, business administration, and courses 
in applied psychology, these materials may 
prove to be repetitious and tend to cause a 
general letdown before such students get to 
the more strictly sociological material. As 
was suggested in an article by ‘1e reviewer in 
The Journal of Educational yociology, No- 
vember 1950, “. . . the basic principles un- 
de: lying industrial sociology are composed of 
established sociological principles and that in- 
dustrial sociology represents a distinctive area 
of investigation for the sociologist which in 
large part he has left to economists and psy- 
chologists.” 

This volume is very comprehensive. As a 
reference for allied courses the comprehensive- 
ness of this volume has great value in pointing 
up the interrelationship of the associated as- 
pects of industrial relations. As a text for in- 
dustrial sociology this very comprehensiveness 
makes it somewhat difficult to point up the 
basic underlying principles of this interest 
area. However, the book is unquestionably a 
real contribution to the role and teaching of 
industrial sociology. 

Glaister A. Elmer 


Air University Far East Research Group 


IES Lighting Handbook. Second Edition. 
New York: Illuminating Engineering So- 
ciety, 1952. Pp.974. $8.00. 


This new edition represents a thorough re- 
vision of the original Handbook which was 
published in 1947. Its objective is to provide 
its readers with essential information on light 
and lighting in simple terms and condensed 
style. The introductory chapters are con- 
cerned with the physics of light, light and 
vision, and nomenclature together with defini- 
tions and symbols. These are followed with 
several chapters dealing with measurement of 
light, color, light control, daylighting, light 
sources and lighting calculations. There are 
then several chapters dealing with lighting in 
various situations such as interiors, exteriors, 
highways, aviation, transportation, and pho- 
tography. The book is concluded with an ex- 
tensive appendix. manufacturer’s data (ad- 
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vertising) and an index. The numerous charts 
and illustrations are very useful and excel- 
lently done. 

Although designed for use by illuminating 
engineers, there is much material included that 
can be useful to the applied psychologist. 
Special mention might be made of the. sec- 
tions on light and vision, nomenclature and 
measurement. Much of these materials should 
be known by the psychologist who is dealing 
with illumination in relation to visual comfort 
and efficiency. Other sections of particular 
interest to psychologists are the chapters on 
color and on interior lighting. Materials on 
recommended and standard practices are not 
included but the bulletins in these areas are 
listed opposite the title page. 

The collection and organization of materials 
in this Handbook represents an extensive and 
difficult task. The committee in charge is to 
be congratulated on achieving an excellent re- 
sult. No illuminating engineer or psychologist 
interested in the applied aspects of lighting 
can afford to be without this reference book. 
Nevertheless, there are a few reservations that 
occur to the reviewer: (1) There is a tendency 


to neglect psychological factors in adjustment 
of the individual to the illumination of work- 


ing and living environments. In future revi- 
sions it might be well to include a chapter on 
this subject. (2) Considerable work in the 
field of illumination has been done by psy- 
chologists. Examination of the lists of refer- 
ences fails to disclose these reports except for 
rare instances. It would seem that the best 
results in lighting could be achieved by co- 
ordinating the work of engineers with that of 
psychologists, physiologists including medical 
men, and physicists. (3) The presentation of 
certain data may lead to misinterpretations. 
For instance, in presenting Weston’s data, 
curves for relative performance but not for 
actual performance are given. The uncritical 
reader might interpret the curves presented to 
mean that, if the illumination is high enough 
discrimination of the low contrast test object 
will equal that of the high contrast one. Ex- 
amination of the performance curves in the 
original report reveals that this is not so. In 
a similar manner, the data on speed of vision 
(Cobb, Ferree and Rand) is plotted in terms 


of the reciprocal of the time. This produces 
an exaggerated picture of the improvement 
with increase in illumination intensity. 


Miles A. Tinker 


University of Minnesota 


Frederiksen, N., and Schrader, W. B. Adjust- 
ment to college. Princeton: Educational 
Testing Service, 1951. Pp. xvii + 504. 


Based on a study of 10,000 men veteran 
and non-veteran students in sixteen American 
colleges following World War IT, this book has 
much to contribute to current educational and 
psychological theory and practice. Even 
though the population studied may, it is 
hoped, never be duplicated, the extent and 
form of this investigation are such that its 
implications are and will be important. 

The book has a somewhat novel organiza- 
tion, for the whole study is summarized in the 
first chapter on a level clearly appropriate for 
the statistically untrained reader. In the re- 
maining chapters, the results are presented in 
generally simple tabular and graphic form, 
while the basic tables and methodological 
notes are contained in the appendices. Al- 
though the first chapter is clearly intended for 
the lay reader, the level of difficulty of the rest 
fluctuates somewhat more than would seem 
desirable, and the college administrator who is 
tempted to read on may encounter rough go- 
ing in certain places. 

Two methodological points are of special 
interest. The authors use as the criterion of 
academic adjustment an index called the 
“Average Adjusted Grade,” a “. . . measure 
of achievement-relative-to-ability. . . .” It 
is a standard score based on analysis of covari- 
ance procedures and represents a significant 
advance over other similar methods of com- 
puting such indices. Secondly, they make use 
of a sign test in assessing group differences, 
recognizing that samples from several colleges 
may be taken as replications of the experi- 
mental situation. This test makes for a preci- 
sion too seldom encountered; it is hoped that 
it and the above criterion method will receive 
the attention they deserve. 

The content of the book is a detailed de- 
scription of the attitudes and behaviors of a 
group of men veterans and non-veterans and 
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a comparison between these. In the process, 
the authors dispose of a number of erroneous 
notions about these groups in particular and 
college students in general. For one thing, the 
similarities reported are more evident than are 
the differences, and the authors wisely recog- 
nize the importance of such “negative” results. 
Consequently, the book contains many de- 
scriptions of generally applicable relationships 
—and lacks of relationship—between the cri- 
terion and such factors as extra-curricular ac- 
tivities, vocational decision; family income, 
outside reading habits, etc. 

Most of the book deals with the results of 
an extensive questionnaire intended to illumi- 
nate causes of obtained differences, if any. 
The authors are aware of the limitations of 
this method, and they report only what the 
students said. But it is easy to accept such 
Statements at face value, something which 
might be, in view of what is known about test- 
taking attitudes, seriously misleading. It is 
to be hoped that other investigators will follow 
the many interesting leads provided and con- 
duct studies employing more powerful tools. 


For any complete picture of the problem of 
college adjustment, there is a great need for 
the integration of such studies with those by 
men such as Pressey whose work on educa- 
tional acceleration makes a fairly clear case 


in favor of the younger collegian. However, 
since this book is not intended to be a sys- 
tematic integration but rather an extensive 
descriptive study, this should not be taken as 
a criticism of it but only as an indication of 
a pressing need for more work. 

Methodologically, this study should serve as 
a model for further research. As far as the 
results are concerned, both college administra- 
tors and psychologists interested in human 
adjustment problems should find in it a very 
great deal that is of interest and value. The 
study was imaginatively planned and carefully 
executed; the authors are to be commended 
for their excellent contribution. 


John W. Gustad 
University of Maryland 


Kelly, E. L., and Fiske, D.W. The prediction 
of performance in clinical psychology. Ann 
Arbor: The University of Michigan Press, 
1951. Pp. 311. $5.00. 


This volume is the report of an ambitious 
five-year research program during the period 
1946-1951, which was directed at the evalua- 
tion of techniques for the selection of graduate 
students for training in a four-year doctoral 
program in clinical psychology. 

The first section of the report is devoted to 
a description of the operating philosophy of 
the project, which was to be both catholic and 
eclectic in the selection of predictors and cri- 
teria, a discussion of the sequential phases of 
the research program, and a presentation of 
normative data descriptive of the 700 subjects. 
Each subject was enrolled in one of 40 univer- 
sities and had field training in one of 50 VA 
installations. As one would anticipate, the 
normative data indicated that there was a 
hierarchy of universities in terms of ability 
and achievement of their students, and there 
were large differences in emphases of training 
programs at the various universities and VA 
installations. 

The second section deals with the three 
types of predictor measures under study. The 
first of these was a group of predictions by 
university staff members of the success of en- 
tering students upon examination of creden- 
tials only and upon examination of credentials 
plus interview in the following areas: Aca- 
demic Performance, Skill in Diagnosis and 
Therapy, Research Competence, and Overall 
Promise as a Clinical Psychologist. The sec- 
ond type of predictor was a series of objective 
tests from which 101 measures were obtained. 
These objective tests were commonly used 
measures of intelligence, interest, and person- 
ality, among the specific tests being the Miller 
Analogies (Form G), the Strong Vocational 
Interest Blank, and the Minnesota Multi- 
phasic Personality Inventory. The third type 
of predictor measure was a series of ratings 
based upon clinical procedures, which included 
intensive interviews and projective tests, and 
both individual and pooled ratings. Most in- 
teresting was a description of a pilot assess- 
ment program in which group situational tests 
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were utilized in addition to other techniques. 
Factor analysis was performed to identify the 
first-order factors of the some 42 variables 
under investigation in the pilot. assessment 
program. 

The third and largest section describes the 
development of criterion measures. There are 
interesting analyses of many problems en- 
countered, such as the first-order versus sec- 
ond-order and specific versus general criteria 
problems. The authors found no satisfactory 
single criterion of success, although they did 
identify three general components of success. 
These were: intellectual accomplishment, clin- 
ical skills of diagnosis and therapy, and skills 
in social relations. They found judges agreed 
much better on the first than the other two 
components. There are many ideas and find- 
ings from which others concerned with similar 
searches for the criterion will-o’-the-wisp may 
profit. Of course, the criteria developed are 
in a sense also predictors of later performance 
as clinical psychologists. Until a follow-up 
study has been made and the criteria utilized 
in this program have been related to on-the- 
job success, the findings of the program are 
questionable. The authors do state that they 


hope to follow-up their subjects some ten, 


fifteen or twenty years later. 
study does merit a fitting sequel. 

The fourth section presents data upon the 
degree to which the predictor measures cor- 
relate with the criterion measures and contains 
a thorough discussion of various factors which 
have an influence upon the magnitude of the 
correlation coefficients. 

The final section contains a summary of 
the major findings. Space prohibits the re- 
viewer from commenting upon most of the 
findings presented either in this section or 
throughout the volume, which is literally 
studded with interesting findings. To the re- 
viewer it was most significant that single ob- 
jective tests predicted most of the criterion 
measures (including global measures, such as 
“Rated Overall Clinical Competence”) just 
about as well as more laborious and time con- 
suming ratings by profes ional staff members, 
and that single projective tests were almost 
worthless in predicting criterion measures. 

In addition, there are several appendices 


Certainly this 


which present many of the devices utilized in 


‘the study and certain other important infor- 


mation, such as rejected criterion measures. 
While the general aim of the program as 
stated in the Preface was to evaluate tech- 
niques for the selection of professional per- 
sonnel, the authors do not purport to resolve 
all problems even within the limited area of 
the selection of clinical psychologists. Cer- 
tainly most of the predictive findings cannot 
be generalized to the selection of personnel for 
training in other professional areas, although 
many of the techniques should offer valuable 
suggestions to researchers. However, it is 
concentrated attacks of this nature which 
should eventually lead to the improvement of 
the selection of personnel for training in the 
professions. The study is must reading for all 
those working in the areas of prediction of 
professional success and of criterion research. 


Stanley E. Jacobs 


Department of the Army, 
Washington, D. C. 


Parker, W. E., and Kleemeier, R. W. Human 
relations in supervision. New York: Mc- 
Graw-Hill, 1951. Pp. vii +472. $4.50. 


At one time or another, most personnel men 
have struggled with the problem of improving 
the human relations skills of company super- 
visory personnel so there is a great deal of 
interest in any text which may prove to be 
useful in discussion or conference groups con- 
cerned with handling human relations prob- 
lems. 

In the authors’ words, Human Relations in 
Supervision is “directed specifically to the 
first-line supervisor, because the establishment 
of good human relations in any organization 
stands or falls upon the skill of these super- 
visors in dealing with human problems.” In 
general, the authors have succeeded in keeping 
the material at this level, employing many 
anecdotes, illustrations, and case studies in 
their attempt to relate human relations prin- 
ciples to the everyday experience of super- 
visors. One undesirable outcome of this level 
of treatment, however, is that much of the 
discussion on topics such as motivation, coun- 
seling, leadership and personal development is 
superficial. 
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In directing this book to the first-line super- 
visor, the authors have emphasized the han- 
dling of problems originating with the em- 
ployee and do not treat directly the problems 
of the supervisor and his impact on the work 
group, the relationships between the various 
supervisory and management levels, or the 
effects of company policy and organization 
structure on the supervisor. 

At least two suggestions for improvement 
come to mind. First, an introductory section 
or, possibly, a separate manual outlining the 
experience in companies using this material 
together with a statement as to the instruc- 
tional methods employed and the outcomes 
of the training would be of great value. Sec- 
ond, the discussion questions following each 
chapter should be reworked since in their pres- 
ent form they invite class members to parrot 
back text material. 

In summary, the authors have done a good 
job in assembling materials for a human rela- 
tions course for men at supervisory levels. 
Whether the textbook-classroom approach 
which is indicated can or will produce the 
desired change is still an unanswered question. 


William E. Kendall 
The Chesapeake and Ohio Railway Company 


Gray, J. Stanley. Psychology in industry. 
New York: McGraw-Hill Book Company, 


Inc., 1952. Pp. vii + 401. $5.00. 


This book reflects the author’s belief that 
any factor which affects the production efforts 
of workers is appropriately classified as in- 
dustrial psychology. This view has resulted 
in a different type of book on psychology in 
industry. It is, however, a disappointing 
book. 

Considerable emphasis is given human engi- 
neering, work curves, physical and physio- 
logical measurements of work, fatigue, effi- 
ciency, nutrition, rest, monotony, boredom, 
lighting and ventilation. Some subjects are 
handled differently than is customary; for ex- 
ample, merit rating is discussed in a chapter 
on wages. A five-page appendix describes and 
illustrates calculations of the mean, standard 
deviation, standard error of the mean, correla- 
tion coefficient, and significance of differences 
between means. 


Although all subjects discussed in the book 
may legitimately be included in the field of 
industrial psychology, relative emphasis de- 
viates sharply from that found in actual prac- 
tice. For example, twenty pages, or five per 
cent of the entire book, are devoted to nutri- 
tion. Subjects which are usually emphasized 
are discussed only briefly; for example, em- 
ployment interviewing is handled on one page. 
Thus the book should not be interpreted as 
giving a true picture of the field as it is com- 
monly conceived. 

The book has a number of faults: broad 
statements are undocumented, superficial defi- 
nitions are used, “obviousness” is used to sup- 
port statements, flat statements are made 
which run counter to experimental evidence 
published elsewhere, broad coverage of subject 
matter results in superficiality. On the other 
side of the ledger are favorable factors such 
as inclusion of material not generally readily 
available to beginning students, uncommon 
use of common sense, and astute insights. 
Unfortunately, however, the assets do not ap- 
pear to offset the limitations of the book. 


Clifford E. Jurgensen 
Minneapolis Gas Company 


Zaleznik, A. Foreman training in a growing 
enterprise. Boston: Harvard Business 
School, 1951. Pp. 232. $3.50. 


“Is [supervisory] training realistic from the 
supervisor’s point of view and in relation to 
his problems at work? The only way to de- 
velop an answer to this question in a par- 
ticular organization is to go to the work level, 
and to observe what is happening” (p. 232). 
The author had done just this. This book is 
concerned with the evaluation of a foreman 
training program in a small manufacturing firm 
through 5 weeks of intensive on-the-job study 
of one of the trainees, a newly appointed fore- 
man. Two other approaches to evaluation are 
also reported—observation of the training ses- 
sions and interviews with foremen. 

The training course evaluated appears to be 
a rather confusing hodge podge of academic 
psychology, rules-of-thumb for handling peo- 
ple, and pep talks—all of which are not un- 
common approaches to foreman training in 
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American industry today. In terms of being 
of value to this particular foreman in equip- 
ping him to better perform his job, the course 
was unsuccessful. 

Despite the rather shaky design upon which 
this study is built where conclusions are drawn 
and recommendations made based on an N 
of a single foreman, this book makes a con- 
tribution. Its main value is in the convincing 
and meaningful manner in which the many 
complex relationships with which a modern 
factory foreman must deal are described. 
Pointing up how inadequately a typical pack- 
aged training program fulfilled the on-the-job 
needs of the foreman only helps to accentuate 
and sharpen the picture of the complexity of 
his job. 

There are a number of weaknesses in the 
study. The author tends to draw too many 
definite conclusions and to overgeneralize from 
his single case. Many of the conclusions and 
recommendations are colored by the back- 
ground and training of the author. For ex- 
ample, the only recommended method of fore- 
man training discussed in any detail is the case 
method. Recommendations on the kind of 
training which would have helped the foreman 
more, including such things as coaching by his 
superior and permissive rather than authorita- 
tive conferences, are not new. However, de- 
spite these shortcomings, against the back- 
ground of the real needs of a live supervisor 
on the job, the conclusions and recommenda- 
tions still are much more convincing than they 
are when they appear as mere statements of 
opinion as is usually the case. 

Because no serious reader can come away 
from this book without a fuller realization of 
the problem faced in developing supervisors, 
it can be highly recommended to persons con- 
cerned with supervisory training. If the book 
is read by this group, the future ought to 
produce more of the broad and continuing 


type of training needed for helping the fore- 
man perform his difficult job. 


Theodore R. Lindbom 


Midland Cooperative Wholesale, 
Minneapolis, Minnesota 


Welch, J. S., and Stone, C. H. How to build 
a merchandise knowledge test. Research 
and Technical Report 8, Industrial Rela- 
tions Center, University of Minnesota. 
Dubuque, Ia.: Wm. C. Brown Company, 
1951. Pp. 21. $1.00. 


This excellent monograph concisely presents 
the methods for the development of job knowl- 
edge tests. Although the purpose is to de- 
scribe the steps in the construction of informa- 
tion tests for use in evaluating experience of 
salespersons, the procedures are general and 
can be applied to any type of job. 

The authors do not claim that they are de- 
scribing any new methods. What they have 
done is to bring together for trade tests the 
procedures for item development, item valida- 
tion, test validation, cross validation, and the 
setting of critical scores, in a most clear and 
logical fashion. The rationale for each step is 
well outlined. The monograph is liberally 
documented with judiciously chosen illustra- 
tions, so that each step is readily understand- 
able. 

The monograph will not only serve as a 
technical manual for those concerned with 
selection problems, but should be an invalu- 
able piece of outside reading for a course in 
test construction or in psychological measure- 
ment. The only shortcoming is in the discus- 
sion of the types of items that might be used 
in job information tests. While a reader un- 
familiar with the field will ultimately obtain 
some notion concerning the scope of possible 
items, in no single section is this aspect well 
developed. 

Edwin E. Ghiselli 


University of California, Berkeley 
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PSYCHOLOGY APPLIED TO LIFE AND WORK, Second Edition 
by HARRY WALKER HEPNER, Professor of Psychology, Syracuse University; 
Consultant in Personnel and Consumer Relations 
e A famous text used in 400 colleges, universities and technical schools for courses in 
Applied Psychology, Business Psychology, Personnel Management, Executive Train- 
ing, Psychology in Industrial Management (orientation for engineers) and General 
Psychology where Applied Psychology is stressed. 


e Applies the adjustment concept to personal and business problems; unifies facts and 
methods with a binding thread of heaey. 


e Shows how to get along with people, develop emotional maturity, direct self-growth 
and intelligently supervise employees in industry. 

e Features full-length discussion of the adjustment concept, vivid illustrative examples, 
recent findings on group dynamics in industry. 


1950 - 6%29" ~- illus. + Teacher's Manual (Restricted) Free on adoption 


MENTAL HYGIENE: The Dynamics of Adjustment, 2nd Ed. (1951) 
By HERBERT A. CARROLL, Head of Department of Psychology, University of 
New Hampshire 

A practical text adopted at over 150 schools. 


e The universal nature of human needs and the conflicts that arise from frustration of 
them are discussed to a students acquire flexible habits of adjustment. Examples 


are drawn from the aut 
Causation is stressed. 


e Emphasis is on the role of direct experience and the form such experience takes in 
determining behavior. 


e General introductory material is included on motivation, individual differences, 
learning and psychometrics. 


e Psychoses are discussed briefly to show the relationship between normal and abnorma 
behavior. 1951 - G48pp. - 5%" 28%" 


SOCIAL PSYCHOLOGY 
by SOLOMON E. ASCH, Professor of Psychology, Swarthmore College 

A new approach to fundamental issues concerned with man in society. 

e Covers DOCTRINES OF MAN—ORGANIZATION IN PSYCHOLOGICAL 
EVENTS — HUMAN INTERACTION — SOCIAL NEEDS — EFFECTS OF 
GROUP CONDITIONS ON JUDGMENTS AND ATTITUDES 

e Treats numerous challenging ideas and propositions of importance to psychology 
and the social disciplines: (e.g., relations between group and individual events 
and the meaning of independence and conformity in social life; the role of emotions 
and reason in the formation and change of attitudes; the inadequacy of traditional 
categories of instinct and habit for the interpretation of social facts). 


1952 - 646 pp. - 6"29" + illus. 
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or’s long teaching and clinical experience with students. 























